Head-related transfer function convolution method and head-related transfer function convolution device

ABSTRACT

A head-related transfer function (HRTF) convolution method arranged, when an audio signal is reproduced acoustically by an electro-acoustic conversion unit disposed in a nearby position of both ears of a listener, to convolute an HRTF into the audio signal, which allows the listener to listen to the audio signal such that a sound image is localized in a perceived virtual sound image localization position, the method including the steps of: measuring, when a sound source is disposed in the virtual sound image localization position, and a sound-collecting unit is disposed in the position of the electro-acoustic conversion unit, a direct-wave direction HRTF regarding the direction of a direct wave, and reflected-wave direction HRTFs regarding the directions of selected one or more reflected waves, from the sound source to the sound-collecting unit, separately beforehand; and convoluting the obtained direct-wave direction HRTF, and the reflected-wave direction HRTFs into the audio signal.

CROSS REFERENCES TO RELATED APPLICATIONS

The present application claims the benefit under 35 U.S.C. §120 as adivisional application of U.S. patent application Ser. No. 12/366,095filed Feb. 5, 2009 under Attorney Docket No. 51459.70563US00 andentitled “HEAD-ELATED TRANSFER FUNCTION CONVOLUTION METHOD ANDHEAD-RELATED TRANSFER FUNCTION CONVOLUTION DEVICE,” which containssubject matter related to Japanese Patent Application JP 2008-045597filed in the Japanese Patent Office on Feb. 27, 2008, the entirecontents of both of which are incorporated herein by reference.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a convolution method and convolutiondevice for convoluting into an audio signal a head-related transferfunction (hereafter abbreviated to “HRTF”) for enabling a listener tohear a sound source situated in front or the like of the listener,during acoustic reproduction with an electric-acoustic unit such as anacoustic reproduction driver of headphones for example, which isdisposed near the ears of the listener.

2. Description of the Related Art

In a case of the listener wearing the headphones on the head forexample, and listening to acoustically reproduced signals with bothears, if the audio signals reproduced at the headphones arecommonly-employed audio signals supplied to speakers disposed to theleft and right in front of the listener, the so-called lateralizationphenomenon, wherein the reproduced sound image stays within the head ofthe listener, occurs.

A technique called virtual sound image localization is disclosed inWO95/13690 Publication and Japanese Unexamined Patent ApplicationPublication No. 03-214897, for example, as having solved this problem ofthe lateralization phenomenon. This virtual sound image localizationenables the sound image to be reproduced (virtually localized in therelevant position) such that when reproduced with a headphone or thelike, the sound image is reproduced as if there were a sound source,e.g., speakers in a predetermined perceived position, such as the leftand right in front of the listener, and is realized as described below.

FIG. 30 is a diagram for describing a technique of virtual sound imagelocalization in a case of reproducing two-channel stereo signals of leftand right with two-channel stereo headphones, for example.

As shown in FIG. 30, at a position nearby both ears of the listenerregarding which placement of two acoustic reproduction drivers such astwo-channel stereo headphones for example (an example of anelectro-acoustic conversion unit) is assumed, microphones (an example ofan acousto-electric conversion unit) ML and MR are disposed, and alsospeakers SPL and SPR are disposed at positions at which virtual soundimage localization is desired.

In a state where a dummy head 1 (alternatively, this may be a human, thelistener himself/herself) is present, an acoustic reproduction of animpulse for example, is performed at one channel, the left channelspeaker SPL for example, and the impulse emitted by that reproduction ispicked up with each of the microphones ML and MR and an HRTF for theleft channel is measured. In the case of this example, the HRTF ismeasured as an impulse response.

In this case, the impulse response serving as the left channel HRTFincludes, as shown in FIG. 30, an impulse response HLd of the soundwaves from the left channel speaker SPL picked up with the microphone ML(hereinafter, referred to as “impulse response of left primarycomponent”), and an impulse response HLc of the sound waves from theleft channel speaker SPL picked up with the microphone MR (hereinafter,referred to as “impulse response of left crosstalk component”).

Next, an acoustic reproduction of an impulse is performed at the rightchannel speaker SPR in the same way, and the impulse emitted by thatreproduction is picked up with each of the microphones ML and MR and anHRTF for the right channel, i.e., the HRTF of the right channel, ismeasured as an impulse response.

In this case, the impulse response serving as the right channel HRTFincludes an impulse response HRd of the sound waves from the rightchannel speaker SPR picked up with the microphone MR (hereinafter,referred to as “impulse response of right primary component”), and animpulse response HRc of the sound waves from the right channel speakerSPR picked up with the microphone ML (hereinafter, referred to as“impulse response of right crosstalk component”).

The impulse responses for the HRTF of the left channel and the HRTF ofthe right channel are convoluted, as they are, with the audio signalssupplied to the acoustic reproduction drivers for the left and rightchannels of the headphones, respectively. That is to say, the impulseresponse of left primary component and impulse response of leftcrosstalk component, serving as the left channel HRTF obtained bymeasurement, are convoluted, as they are, with the left signal audiosignals, and the impulse response of right primary component and impulseresponse of right crosstalk component, serving as the right channel HRTFobtained by measurement, are convoluted, as they are, with the rightsignal audio signals.

This enables sound image localization (virtual sound image localization)such that sound is perceived to be just as if it were being reproducedfrom speakers disposed to the left and right in front of the listener inthe case or two-channel stereo audio of left and right for example, eventhough the acoustic reproduction is nearby the ears of the listener.

A case of two channels has been described above, but with a case ofthree or more channels, this can be performed in the same way bydisposing speakers at the virtual sound image localization positions foreach of the channels, reproducing impulses for example, measuring theHRTF for each channel, and convolute impulse responses of the HRTFsobtained by measurement as to the audio signals supplied to the driversfor the acoustic reproduction by the two channels, left and right, ofthe headphones.

SUMMARY OF THE INVENTION

Incidentally, when a place where measurement of an HRTF is performed isnot an anechoic chamber, not only a direct wave from a perceived soundsource (corresponding to a virtual sound image localization position)and but also the components of a reflected wave such as shown in adotted line in FIG. 30 are included (without being separated) in ameasured HRTF. Therefore, a measured HRTF according to the related artincludes the properties of the relevant measurement place according tothe shape of a chamber or place or the like where measurement has beenperformed, and a material such as a wall, ceiling, floor, or the likewhere a sound wave is reflected.

In order to eliminate properties of the room or place where measurementis performed, measuring in an anechoic chamber, where there are noreflections from the floor, ceiling, walls, and so forth, can beconceived. However, in the event of convoluting HRTFs measured in ananechoic chamber as they are into audio signals, there is a problem thatvirtual sound image localization and orientation are somewhat fuzzysince there is no reflected waves in the case of attempting to virtuallylocalize a sound image.

Accordingly, with the related art, measurement of HRTF to be used asthey are for convolution with audio signals is not performed in ananechoic chamber, but rather, HRTFs are measured in a room with acertain amount of reverberation. Further, there has been proposed anarrangement wherein a menu of rooms or places where the HRTFs weremeasured, such as a studio, hall, large room, and so forth, beingpresented to the user, so that the user who wants to enjoy music withvirtual sound image localization can select the HRTF of a desired roomor place from the menu.

However, as described above, with the related art, measurement of HRTFsis performed with not only impulse responses of direct waves from aperceived sound source position but also accompanying impulse responsesfrom reflected waves without being able to separate the impulse responseof direct waves and reflected waves, including both, so only an HRTFaccording to a measured place or room is obtainable, and accordingly, ithas been difficult to obtain an HRTF according to a desired ambientenvironment or room environment, and convolute this into an audiosignal. For example, it has been difficult to convolute an HRTFcorresponding to a perceived listening environment into an audio signalsuch as where speakers are disposed in front on a vast plain which hasneither walls nor obstructions thereabout.

Also, in the case of attempting to obtain an HRTF in a room having aperceived predetermined shape and inner volume, and a wall of apredetermined degree of sound absorption (corresponding to theattenuation rate of a sound wave), heretofore, there has been no wayother than a method to look for or fabricate such a room, and an HRTF ismeasured and obtained in this room. However, in reality, it is difficultto look for or fabricate such a desired listening environment or room,and present used techniques are not sufficient to convolute an HRTFcorresponding to a desired arbitrary listening environment or roomenvironment into an audio signal.

It has been found desirable to provide a head-related transfer functionconvolution method and device, which enables convolution of an HRTFcorresponding to a desired arbitrary listening environment or roomenvironment to be performed, and a desired virtual sound imagelocalization feeling to be obtained.

A head-related transfer function convolution method according to anembodiment of the present invention arranged, when an audio signal isreproduced acoustically by an electro-acoustic conversion unit disposedin a nearby position of both ears of a listener, to convolute ahead-related transfer function into the audio signal, which allows thelistener to listen to the audio signal such that a sound image islocalized in a perceived virtual sound image localization position, thehead-related transfer function convolution method including the stepsof: measuring, when a sound source is disposed in the virtual soundimage localization position, and a sound-collecting unit is disposed inthe position of the electro-acoustic conversion unit, a direct wavedirection head-related transfer function regarding the direction of adirect wave from the sound source to the sound-collecting unit, and areflected wave direction head-related transfer function regarding thedirection of selected one reflected wave or reflected wave directionhead-related transfer functions regarding the directions of selectedmultiple reflected waves, from the sound source to the sound-collectingunit, to obtain such head-related transfer functions, separatelybeforehand; and convoluting the obtained direct wave directionhead-related transfer function, and the reflected wave directionhead-related transfer function regarding the direction of the selectedone reflected wave or the reflected wave direction head-related transferfunctions regarding the directions of the selected multiple reflectedwaves, into the audio signal.

Heretofore, as described above, integral head-related transfer functionsincluding both of a direct wave direction head-related transfer functionand reflected wave direction head-related transfer function aremeasured, and are convoluted into an audio signal without change, on theother hand, with the above configuration, at a head-related transferfunction measuring process a direct wave direction head-related transferfunction and reflected wave direction head-related transfer function aremeasured separately beforehand. Subsequently, the obtained direct wavedirection head-related transfer function and reflected wave directionhead-related transfer function are convoluted into an audio signal.

Here, the direct wave direction head-related transfer function is ahead-related transfer function obtained from only a sound wave formeasurement directly input to a sound-collecting unit from a soundsource disposed in a perceived virtual sound image localizationposition, and does not include the components of a reflected wave.

Also, the reflected wave direction head-related transfer function is ahead-related transfer function obtained from only a sound wave formeasurement directly input to a sound-collecting unit from a perceivedreflected wave direction, and does not include components reflected atwhichever and input to a sound-collecting unit from a sound source inthe relevant reflected wave direction.

Subsequently, in the measuring, as described above, a head-relatedtransfer function for a direct wave, and a head-related transferfunction for a reflected wave are obtained separately when a virtualsound image localization position is a sound source, but at this time,as a reflected wave direction for obtaining a reflected wave directionhead-related transfer function one or multiple reflected wave directionsare selected according to a perceived listening environment or roomenvironment.

For example, in the case of assuming that a listening environment is avast plain, there is neither surrounding walls nor ceiling, and thereare only a direct wave from a sound source perceived in a virtual soundimage localization position, and a sound wave reflected at the groundsurface or floor from the sound source, and accordingly, a direct wavedirection head-related transfer function, and a reflected wave directionhead-related transfer function in the direction of a reflected wave fromthe ground surface or floor are obtained, and these head-relatedtransfer functions are convoluted into an audio signal.

Also, in a case wherein a rectangular parallelepiped common room isassumed as a listening environment, as reflected waves, there are soundwaves reflected at the surrounding wall, ceiling, and floor of alistener, and accordingly, the reflected wave direction head-relatedtransfer function regarding each of the reflected wave directions isobtained, and the relevant reflected wave direction head-relatedtransfer functions and direct wave direction head-related transferfunctions are convoluted into an audio signal.

In the convoluting, corresponding convolution of the direct wavedirection head-related transfer function and the reflected wavedirection head-related transfer functions may be executed upon a timeseries signal of the audio signal from each of a start point in time tostart convolution processing of the direct wave direction head-relatedtransfer function, and a start point in time to start convolutionprocessing of each of reflected wave direction head-related transferfunctions, determined according to the path length of sound waves fromthe virtual sound image localization position and the position of theelectro-acoustic conversion means of each of the direct waves and thereflected waves.

With the above configuration, a start point in time for startingconvolution processing of a direct wave direction head-related transferfunction, and a start point in time for starting convolution processingof each of a single or multiple reflected wave direction head-relatedtransfer functions are determined according to the path lengths of soundwaves from the virtual sound image localization positions of a directwave and reflected wave to the electro-acoustic conversion unit. In thiscase, the path length regarding a reflected wave is determined accordingto a perceived listening environment or room environment.

In other words, the convolution start point in time of each of thehead-related transfer functions is set according to the path lengthsregarding the direct wave and reflected wave, whereby an appropriatehead-related transfer function according to a perceived listeningenvironment or room environment can be convoluted into an audio signal.

With regard to the reflected wave direction head-related transferfunctions, gain may be adjusted according to an attenuation rate ofsound waves at a perceived reflected portion, and the convolution isexecuted.

With the above configuration, in a perceived listening environment orroom environment, a reflected wave direction head-related transferfunction in the direction from a reflection portion which reflects asound wave is adjusted by gain worth corresponding to an attenuationrate determined with the material or the like of the relevant reflectionportion, and is convoluted into an audio signal. Thus, according to theabove configuration, a head-related transfer function, wherein anattenuation rate caused by noise absorption or the like at a reflectionportion of a sound wave in a perceived listening environment or roomenvironment is taken into consideration, can be convoluted into an audiosignal.

According to the above arrangements, a suitable HRTF can be convolutedinto an audio signal, which corresponds to a perceived listeningenvironment or room environment.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a system configuration example to which anHRTF (head-related transfer function) measurement method according to anembodiment of the present invention is to be applied;

FIGS. 2A and 2B are diagrams for describing HRTF and natural-statetransfer property measurement positions with the HRTF measurement methodaccording to an embodiment of the present invention;

FIG. 3 is a diagram for describing the measurement position of HRTFs inthe HRTF measurement method according to an embodiment of the presentinvention;

FIG. 4 is a diagram for describing the measurement position of HRTFs inthe HRTF measurement method according to an embodiment of the presentinvention;

FIG. 5 is a block diagram illustrating a configuration of a reproductiondevice to which the HRTF convolution method according an embodiment ofto the present invention has been applied;

FIGS. 6A and 6B are diagrams illustrating an example of properties ofmeasurement result data obtained by an HRTF measurement unit and anatural-state transfer property measurement unit with an embodiment ofthe present invention;

FIGS. 7A and 7B are diagrams illustrating an example of properties ofnormalized HRTFs obtained by an embodiment of the present invention;

FIG. 8 is a diagram illustrating an example of properties to be comparedwith properties of normalized HRTFs obtained by an embodiment of thepresent invention;

FIG. 9 is a diagram illustrating an example of properties to be comparedwith properties of normalized HRTFs obtained by an embodiment of thepresent invention;

FIG. 10 is a diagram for describing a convolution process section of acommon HRTF according to the related art;

FIG. 11 is a diagram for describing a first example of a convolutionprocess section of a normalized HRTF according to an embodiment of thepresent invention;

FIG. 12 is a block diagram illustrating a hardware configuration examplefor implementing the first example of a convolution process section of anormalized HRTF according to an embodiment of the present invention;

FIG. 13 is a diagram for describing a second example of a convolutionprocess section of a normalized HRTF according to an embodiment of thepresent invention;

FIG. 14 is a block diagram illustrating a hardware configuration examplefor implementing the second example of a convolution process section ofa normalized HRTF according to an embodiment of the present invention;

FIG. 15 is a diagram for describing an example of 7.1 channelmulti-surround;

FIG. 16 is a block diagram illustrating a part of an acousticreproduction system to which an HRTF convolution method according to anembodiment of the present invention has been applied;

FIG. 17 is a block diagram illustrating a part of an acousticreproduction system to which the HRTF convolution method according to anembodiment of the present invention has been applied;

FIG. 18 is a block diagram illustrating an internal configurationexample of the HRTF convolution processing unit in FIG. 16;

FIG. 19 is a diagram for describing an example of the direction of asound wave for convoluting a normalized HRTF with the HRTF convolutionmethod according to an embodiment of the present invention;

FIG. 20 is a diagram for describing an example of convolution starttiming of a normalized HRTF with the HRTF convolution method accordingto an embodiment of the present invention;

FIG. 21 is a diagram for describing an example of the direction of asound wave for convoluting a normalized HRTF with the HRTF convolutionmethod according to an embodiment of the present invention;

FIG. 22 is a diagram for describing an example of convolution starttiming of a normalized HRTF with the HRTF convolution method accordingto an embodiment of the present invention;

FIG. 23 is a diagram for describing an example of the direction of asound wave for convoluting a normalized HRTF with the HRTF convolutionmethod according to an embodiment of the present invention;

FIG. 24 is a diagram for describing an example of convolution starttiming of a normalized HRTF with the HRTF convolution method accordingto an embodiment of the present invention;

FIG. 25 is a diagram for describing an example of the direction of asound wave for convoluting a normalized HRTF with the HRTF convolutionmethod according to an embodiment of the present invention;

FIG. 26 is a diagram for describing an example of convolution starttiming of a normalized HRTF with the HRTF convolution method accordingto an embodiment of the present invention;

FIGS. 27A through 27F are diagrams for describing an example ofconvolution start timing of a normalized HRTF with the HRTF convolutionmethod according to an embodiment of the present invention;

FIG. 28 is a diagram for describing an example of the direction of asound wave for convoluting a normalized HRTF with the HRTF convolutionmethod according to an embodiment of the present invention;

FIG. 29 is a block diagram illustrating a part of another example of anacoustic reproduction system to which the HRTF convolution methodaccording to an embodiment of the present invention has been applied;and

FIG. 30 is a diagram used for describing HRTFs.

DESCRIPTION OF THE PREFERRED EMBODIMENTS Brief Overview of Embodiment ofthe Present Invention

As described above, with an HRTF convolution method according to therelated art, an arrangement has been made wherein a speaker is disposedin a perceived sound source position to localize a virtual sound image,an HRTF is measured assuming that an impulse response caused by areflected wave is involved instead of an impulse response caused by adirect wave from the relevant perceived sound source position beinginvolved (assuming that impulse responses between a direct wave andreflected wave are both included without being separated), the measuredand obtained HRTF is convoluted into an audio signal without change.

That is to say, heretofore, the HRTF for a direct wave and the HRTF fora reflected wave from a sound source position perceived so as tolocalize a virtual sound image have been measured as an integral HRTFincluding both without being separated.

On the other hand, with an embodiment of the present invention, the HRTFfor a direct wave and the HRTF for a reflected wave from a sound sourceposition perceived so as to localize a virtual sound image are measuredseparately beforehand.

Therefore, with the present embodiment, an HRTF regarding a direct wavefrom a perceived sound source perceived in a particular direction asviewed from a measurement point position (i.e., sound wave reachingdirectly the measurement point position including no reflected wave) isto be obtained. With the direction of a sound wave after being reflectedoff a wall or the like as a sound source direction, the HRTF for areflected wave is measured as a direct wave from the sound sourcedirection thereof. That is to say, in the case of considering areflected wave which is reflected off a predetermined wall, and input toa measurement point position, the reflected sound wave from the wallafter being reflected off the wall can be regarded as a direct wave of asound wave from a sound source perceived in a reflected positiondirection at the relevant wall.

Accordingly, with the present embodiment, when measuring an HRTF for adirect wave from a sound source position perceived so as to localize avirtual sound image, an electro-acoustic converter serving as ameasuring sound wave generating unit, e.g., speaker is disposed in theperceived sound source position so as to localize the relevant virtualsound image, but when measuring an HRTF for a reflected wave from asound source position perceived so as to localize a virtual sound image,an electro-acoustic converter serving as a measuring sound wavegenerating unit, e.g., speaker is disposed in the incident direction tothe measurement point position of a reflected wave to be measured.

Accordingly, an HRTF regarding reflected waves from various directionsis measured by disposing an electro-acoustic converter serving as ameasuring sound wave generating unit in the incident direction to themeasurement point position of each reflected wave.

Subsequently, with the present embodiment, HRTFs regarding a direct waveand reflected waves thus measured are convoluted into an audio signal,thereby obtaining virtual sound image localization within targetreproduction acoustic space, but with regard to HRTFs for reflectedwaves, only a reflected wave in a direction selected according to thetarget reproduction acoustic space is convoluted into an audio signal.

Also, with the present embodiment, HRTFs regarding a direct wave andreflected waves are measured by removing propagation delay worthcorresponding to the path length of a sound wave from a measuring soundsource position to a measurement point position, and at the time ofperforming processing for convoluting each of the HRTFs into an audiosignal, the propagation delay worth corresponding to the path length ofa sound wave from a measuring sound source position (virtual sound imagelocalization position) to a measurement point position (acousticreproduction unit position) is taken into consideration.

Thus, an HRTF regarding a virtual sound image localization positionarbitrarily set according to the size of a room or the like can beconvoluted into an audio signal.

Subsequently, properties such as the degree of reflection, degree ofsound absorption, or the like due to the material of a wall or the likerelating to the attenuation rate of a reflected sound wave are perceivedas the gain of a direct wave from the relevant wall. That is to say,with the present embodiment, for example, an HRTF according to a directwave from a perceived sound source position to a measurement pointposition is convoluted into an audio signal without attenuation, andalso with regard to reflected sound wave components from the wall, anHRTF according to a direct wave from a sound source perceived in thereflected position direction of the wall thereof is convoluted with anattenuation rate according to the degree of reflection or degree ofsound absorption corresponding to the properties of the wall.

The reproduction sound of an audio signal into which an HRTF is thusconvoluted is listened to, whereby verification can be made whether toobtain what type of a virtual sound image localization state accordingto the degree of reflection or degree of sound absorption correspondingto the properties of the wall.

Also, acoustic reproduction from convolution in audio signals of HRTFsof direct waves and HRTFs of selected reflected waves, taking intoconsideration the attenuation rate, enables simulation of virtual soundimage localization in various room environments and place environments.This can be realized by separating a direct wave and reflected wavesfrom the perceived sound source position, and measuring as HRTFs.

Description of HRTF Measurement Method

As described above, HRTFs regarding a direct wave from which thereflected wave components have been eliminated can be obtained bymeasuring in an anechoic chamber, for example.

Accordingly, with an anechoic chamber, HRTFs are measured regarding adirect wave from a desired virtual sound image localization position,and perceived multiple reflected waves, and are employed forconvolution.

That is to say, with an anechoic chamber, HRTFs are measured bydisposing a microphone serving as an acousto-electric conversion unitfor collecting a sound wave for measurement in a measurement pointposition in the vicinity of both ears of a listener, and also disposinga sound source for generating a sound wave for measurement in thepositions of the directions of the direct wave and multiple reflectedwaves.

Incidentally, even if HRTFs are obtained within an anechoic chamber, theproperties of speaker and microphone of a measuring system for measuringan HRTF are not eliminated, which causes a problem wherein the HRTFsmeasured and obtained have been affected by the properties of thespeaker and microphone employed for measurement.

In order to eliminate the effects of properties of the microphones andspeakers, using expensive microphones and speakers having excellentproperties with flat frequency properties as the microphones andspeakers used for measuring the HRTFs. However, even such expensivemicrophones and speakers do not yield ideally flat frequency properties,so there have been cases wherein the effects of the properties of suchmicrophones and speakers could not be completely eliminated, leading todeterioration in the sound quality of the reproduced audio.

Also, eliminating the properties of the microphones and speakers can beconceived by correcting audio signals following convolution of theHRTFs, using inverse properties of the measurement system microphonesand speakers, but in this case, there is the problem that a correctioncircuit has to be provided to the audio signal reproduction circuit, sothe configuration becomes complicated, and also correction completeeliminating the effects of the measurement system is difficult.

In order to eliminate the influence of a room or place for measurementin light of the above-mentioned problems, with the present embodiment,HRTFs are measured within an anechoic chamber, and also in order toeliminate the influence of the properties of a microphone and speakeremployed for measurement, the HRTFs measured and obtained are subjectedto normalization processing such as described below. First, anembodiment of the HRTF measurement method according to the presentembodiment will be described with reference to the drawings.

FIG. 1 is a block diagram of a configuration example of a system forexecuting processing procedures for obtaining data for a normalized HRTFused with the HRTF measurement method according to an embodiment of thepresent invention. With this example, an HRTF measurement unit 10performs measurement of HRTFs in an anechoic chamber, in order tomeasure head-related transfer properties of direct waves alone. With theHRTF measurement unit 10, in the anechoic chamber, a dummy head or anactual human serving as the listener is situated at the position of thelistener, and microphones serving as an acousto-electric conversion unitfor collecting sound waves for measurement are situated at positions(measurement point positions) nearby both ears of the dummy head orhuman, where an electro-acoustic conversion unit for performing acousticreproduction of audio signals in which the HRTFs have been convolutedare placed.

In a case where the electro-acoustic conversion unit for performingacoustic reproduction of audio signals in which the HRTFs have beenconvoluted are headphones with two channels of left and right forexample, a microphone for the left channel is situated at the positionof the headphone driver of the left channel, and a microphone for theright channel is situated at the position of the headphone driver of theright channel.

Subsequently, a speaker serving as an example of a measurement soundsource is situated at one of the directions regarding which an HRTF isto be measured, with the listener or microphone position serving as ameasurement point position as a basing point. In this state, measurementsound waves for the HRTF, impulses in this case, are reproduced fromthis speaker, and impulse responses are picked up with the twomicrophones. Note that in the following description, a position in adirection regarding which an HRTF is to be measured, where the speakerfor the measurement sound source is placed, will be referred to as a“perceived sound source position”.

With the HRTF measurement unit 10, the impulse responses obtained fromthe two microphones represent HRTFs. With this embodiment, themeasurement at the HRTF measurement unit 10 corresponds to a firstmeasuring.

With a natural-state transfer property measurement unit 20, measurementof natural-state transfer properties is performed under the sameenvironment as with the HRTF measurement unit 10. That is to say, withthis example, the transfer properties are measured in a nature statewherein there is neither the human nor the dummy head at the listener'sposition, i.e., there is no obstacles between a measurement sourceposition and a measurement point position.

Specifically, with the natural-state transfer property measurement unit20, the dummy head or human situated with the HRTF measurement unit 10in the anechoic chamber is removed, a natural state with no obstaclesbetween the speakers which are the perceived sound source position andthe microphones is created, and with the placement of the speakers whichare the perceived sound source position and the microphones beingexactly the same state as with the HRTF measurement unit 10, in thisstate, measurement sound waves, impulses in this example, are reproducedby perceived sound source position speakers, and the impulse responsesare picked up with the two microphones.

The impulse responses obtained form the two microphones with thenatural-state transfer property measurement unit 20 representnatural-state transfer properties with no obstacles such as the dummyhead or human.

Note that with the HRTF measurement unit 10 and the natural-statetransfer property measurement unit 20, the above-described HRTFs andnatural-state transfer properties for the left and right primarycomponents, and HRTFs and natural-state transfer properties for the leftand right crosstalk components, are obtained from each of the twomicrophones. Later-described normalization processing is performed foreach of the primary components and left and right crosstalk components.In the following description, normalization processing will be describedregarding only the primary components for example, and description ofnormalization processing regarding the crosstalk components will beomitted, to facilitate description. Of course, normalization processingis performed in the same way regarding the crosstalk components, aswell.

The impulse responses obtained with the HRTF measurement unit 10 and thenatural-state transfer property measurement unit 20 are output ofdigital data of 8,192 samples at a sampling frequency of 96 kHz withthis example.

Now, the data of the HRTF obtained from the HRTF measurement unit 10 ispresented as X(m), where m=0, 1, 2 . . . , M−1 (M=8192), and data of thenatural-state transfer property obtained from the natural state transferproperty measurement unit 20 is presented as Xref(m), where m=0, 1, 2 .. . , M−1 (M=8192).

The HRTF data X(m) from the HRTF measurement unit 10 and thenatural-state transfer property data Xref(m) from the natural-statetransfer property measurement unit 20 are subjected to removal of dataof the head portion from the point in time at which reproduction ofimpulses was started at the speakers, by an amount of delay timeequivalent to the arrival time of sound waves from the speaker at theperceived sound source position to the microphones for obtaining pulseresponses, by delay removal shift-up units 31 and 32, and also at thedelay removal shift-up units 31 and 32 the number of data is reduced toa number of data of a power of two, such that orthogonal transform fromtime-axial data to frequency-axial data can be performed nextdownstream.

Next, the HRTF data X(m) and the natural-state transfer property dataXref(m), of which the number of data has been reduced at the delayremoval shift-up units 31 and 32, are supplied to FFT (Fast FourierTransform) units 33 and 34 respectively, and transformed from time-axialdata to frequency-axial data. Note that with the present embodiment, theFFT units 33 and 34 perform Complex Fast Fourier Transform (Complex FFT)which takes into consideration the phase.

Due to the complex FFT processing at the FFT unit 33, the HRTF data X(m)is transformed to FFT data made up of a real part R(m) and an imaginarypart jI(m), i.e., R(m)+jI(m).

Also, due to the complex FFT processing at the FFT unit 34, thenatural-state transfer property data Xref(m) is transformed to FFT datamade up of a real part Rref(m) and an imaginary part jIref(m), i.e.,Rref(m)+jIref(m).

The FFT data obtained from the FFT units 33 and 34 are X-Y coordinatedata, and with this embodiment, further polar coordinates conversionunits 35 and 36 are used to convert the FFT data into polar coordinatesdata. That is to say, the HRTF FFT data R(m)+jI(m) is converted by thepolar coordinates conversion unit 35 into a radius γ(m) which is a sizecomponent, and an amplitude θ(m) which is an angle component. The radiusγ(m) and amplitude θ(m) which are the polar coordinates data are sent toa normalization and X-Y coordinates conversion unit 37.

Also, the natural-state transfer property FFT data Rref(m)+jIref(m) isconverted by the polar coordinates conversion unit 35 into a radiusγref(m) and an amplitude θref(m). The radius γref(m) and amplitudeθref(m) which are the polar coordinates data are sent to thenormalization and X-Y coordinates conversion unit 37.

At the normalization and X-Y coordinates conversion unit 37, first, theHRTF measured including the dummy head or human is normalized using thenatural-state transmission property where there is no obstacle such asthe dummy head. Specific computation of the normalization processing isas follows.

With the radius following normalization as γn(m) and the amplitudefollowing normalization as θn(m),

γn(m)=γ(m)/γref(m)

θn(m)=θ(m)/θref(m)  (Expression 1)

holds.

Subsequently, at the normalization and X-Y coordinates conversion unit37, the polar coordinate system data following normalization processing,the radius γn(m) and the amplitude θn(m), is converted into normalizedHRTF data of frequency-axial data of the real part Rn(m) and imaginarypart jIn(m) (m=0, 1 . . . M/4-1) of the X-Y coordinate system.

The normalized HRTF data of the frequency-axial data of the X-Ycoordinate system is transformed into impulse response Xn(m) which isnormalized HRTF data of the time-axis at an inverse FFT unit 38. Theinverse FFT unit 38 performs Complex Inverse Fast Fourier Transform(Complex Inverse FFT).

That is to say, computation of

Xn(m)=IFFT(Rn(m)+jIn(m))

where m=0, 1, 2 . . . M/2-1, is performed at the Inverse FFT (IFFT(Inverse Fast Fourier Transform)) unit 38, which obtains the impulseresponse Xn(m) which is time-axial normalized HRTF data.

The normalized HRTF data Xn(m) from the inverse FFT unit 38 issimplified to impulse property tap length which can be processed (whichcan be convoluted, described later), at an IR (impulse response)simplification unit 39. With this embodiment, this is simplified to 600taps (600 pieces of data from the head of the data from the inverse FFTunit 38).

The normalized HRTF data Xn(m) (m=0, 1 . . . 599) simplified at the IRsimplification unit 39 is written to the normalized HRTF memory 40 forlater-described convolution processing. Note that the normalized HRTFwritten to this normalized HRTF memory 40 includes a normalized HRTFwhich is a primary component, and a normalized HRTF which is a crosstalkfunction, at each of the perceived sound source positions (virtual soundimage localization positions), as described earlier.

The description above has been description regarding processing forobtaining normalized HRTFs as to a speaker position in a case where aspeaker for reproducing impulses as an example of measurement soundwaves is situated at one perceived sound source position separated froma microphone position with a measurement point position by apredetermined distance, in one particular direction as to a listenerposition.

With this embodiment, the perceived sound source position, which is theposition at which the speaker for reproducing the impulses serving asthe example of a measuring sound wave is positioned, is changedvariously in different directions as to the measurement point position,with a normalized HRTF being obtained for each perceived sound sourceposition.

That is to say, with the present embodiment, HRTFs are obtainedregarding not only a direct wave but also reflected waves from a virtualsound image localization position, and accordingly, a virtual soundsource position is set to multiple positions in light of the incidentdirection to measurement point positions for reflected waves, therebyobtaining normalized HRTFs thereof.

Now, the perceived sound source position which is the speaker placementposition is changed in increments of 10 degrees at a time for example,which is a resolution for a case of taking into consideration thedirection of a reflected wave direction to be obtained, over an angularrange of 360 degrees or 180 degrees center on the microphone position orlistener which is the measurement position, within a horizontal plane,to obtain normalized HRTFs regarding reflected waves from both sidewalls of the listener.

Similarly, the perceived sound source position which is the speakerplacement position is changed in increments of 10 degrees at a time forexample, which is a resolution for a case of taking into considerationthe direction of a reflected wave direction to be obtained, over anangular range of 360 degrees or 180 degrees center on the microphoneposition or listener which is the measurement position, within avertical plane, to obtain a normalized HRTF regarding a reflected wavefrom the ceiling or floor.

A case of taking into consideration an angular range of 360 degrees is acase wherein there is a virtual sound image localization positionserving as a direct wave behind the listener, for example, a caseassuming reproduction of multi-channel surround-sound audio such as 5.1channels, 6.1 channels, 7.1 channels, and so forth, and also a case oftaking into consideration a reflected wave from the wall behind thelistener. A case of taking into consideration an angular range of 180degrees is a case assuming that the virtual sound image localizationposition is only in front of the listener, or a state where there are noreflected waves from a wall behind the listener.

Also, with this embodiment, the position where the microphones aresituated is changed in the measurement method of the HRTF andnatural-state transfer property at the HRTF measurement units 10 and 20,in accordance with the position of acoustic reproduction drivers such asthe drivers of the headphones actually supplying the reproduced sound tothe listener.

FIGS. 2A and 2B are diagrams for describing HRTF and natural-statetransfer property measurement positions (perceived sound sourcepositions) and microphone placement positions serving as measurementpoint positions, in a case wherein the acoustic reproduction unitserving as electro-acoustic conversion unit for actually supplying thereproduced sound to the listener are inner headphones.

Specifically, FIG. 2A illustrates a measurement state with the HRTFmeasurement unit 10 where the acoustic reproduction unit for supplyingthe reproduced sound to the listener are inner headphones, with a dummyhead or human OB situated at the listener position, and with the speakerfor reproducing impulses at the perceived sound source positions beingsituated at predetermined positions in the direction regarding whichHRTFs are to be measured, at 10 degree intervals, centered on thelistener position or the center position of the two driver positions ofthe inner headphones, in this example, as indicated by dots P1, P2, P3,. . . .

Also, with this example of the case of the inner headphones, the twomicrophones ML and MR are situated at positions within the auditorycapsule positions of the ears of the dummy head or human, as shown inFIG. 2A.

FIG. 2B shows a measurement environment state wherein the dummy head orhuman OB in FIG. 2A has been removed, illustrating a measurement statewith the natural-state transfer property measurement unit 20 where theelectro-acoustic conversion unit for supplying the reproduced sound tothe listener are inner headphones.

The above-described normalization processing is carried out bynormalizing HRTFs measured at each of the perceived sound sourcepositions indicated by dots P1, P2, P3, . . . in FIG. 2A, with thenatural-state transfer properties measured in FIG. 2B at the sameperceived sound source positions indicated by dots P1, P2, P3, . . . aswith FIG. 2B, respectively. For example, an HRTF measured at theperceived sound source position P1 is normalized with the natural-statetransfer property measured at the same perceived sound source positionP1.

Next, FIG. 3 is a diagram for describing the perceived sound sourceposition and microphone placement position at the time of measuringHRTFs and natural-state transfer properties in the case that theacoustic reproduction unit for supplying the reproduced sound to thelistener is over-head headphones. With the over-head headphones of theexample in FIG. 3, the one headphone driver each is provided for bothears, respectively.

More specifically, FIG. 3 illustrates a measurement state with the HRTFmeasurement unit 10 where the acoustic reproduction unit for supplyingthe reproduced sound to the listener are over-head headphones, with adummy head or human OB being positioned at the listener position, andwith the speaker for reproducing impulses at the perceived sound sourcepositions being situated at perceived sound source positions in thedirection regarding which HRTFs are to be measured, at 10 degreeintervals, centered on the listener position or the center position ofthe two driver positions of the over-head headphones, in this example,as indicated by dots P1, P2, P3, . . . . Also, the two microphones MLand MR are situated at positions nearby the ears facing the auditorycapsules of the ears of the dummy head or human, as shown in FIG. 3.

The measurement state at the natural-state transfer property measurementunit 20 in the case that the acoustic reproduction unit is over-headheadphones is a measurement environment wherein the dummy head or humanOB in FIG. 3 has been removed. In this case as well, it is needless tosay that measurement of the HRTFs and natural-state transfer properties,and the normalization processing, are performed in the same way as withFIGS. 2A and 2B.

Next, FIG. 4 is a diagram for describing the perceived sound sourceposition and microphone placement position at the time of measuringHRTFs and natural-state transfer properties in the case of placingelectro-acoustic conversion unit serving as acoustic reproduction unitfor supplying the reproduced sound to the listener, speakers forexample, in a headrest portion of a chair in which the listener sits,for example. With the example in FIG. 4, an HRTF and natured-statetransfer properties are measured in a case wherein two speakers aredisposed on the left and right behind the head of a listener, andacoustic reproduction is performed.

More specifically, FIG. 4 illustrates a measurement state with the HRTFmeasurement unit 10 where the acoustic reproduction unit for supplyingthe reproduced sound to the listener are speakers positioned in aheadrest portion of a chair, with a dummy head or human OB beingpositioned at the listener position, and with the speaker forreproducing impulses at the perceived sound source positions beingsituated at perceived sound source positions in the direction regardingwhich HRTFs are to be measured, at 10 degree intervals, centered on thelistener position or the center position of the two speaker positionsplaced in the headrest portion of the chair, in this example, asindicated by dots P1, P2, P3, . . . .

Also, as shown in FIG. 4, the two microphones ML and MR are situated atpositions behind the head of the dummy head or human and nearby the earsof the listener, which is equivalent to the placement positions of thetwo speakers attached to the headrest of the chair.

The measurement state at the natural-state transfer property measurementunit 20 in the case that the acoustic conversion reproduction unit iselectro-acoustic conversion drivers attached to the headrest of thechair is a measurement environment wherein the dummy head or human OB inFIG. 4 has been removed. In this case as well, it is needless to saythat measurement of the HRTFs and natural-state transfer properties, andthe normalization processing, are performed in the same way as withFIGS. 2A and 2B.

Next, FIG. 5 is a diagram for describing a perceived sound sourceposition and microphone installation position when measuring an HRTF andnature-stated transfer properties in a case wherein an acousticreproduction unit for supplying reproduction sound to a listener isover-head headphones in which seven headphone driver units each aredisposed as to each of both ears as over-head headphones for 7.1 channelmulti-surround. With the example in FIG. 5, seven microphones ML1, ML2,ML3, ML4, ML5, ML6, and ML7, and seven microphones MR1, MR2, MR3, MR4,MR5, MR6, and MR7 are disposed in the corresponding seven headphonedrivers for the left ear and seven headphone drivers for the right ear,facing the left ear and right ear of the listener, respectively.

Subsequently, speakers for reproducing impulses are disposed inperceived sound source positions in directions desired to measure anHRTF, for example, for each 10 degrees interval with the listenerposition or the center position of the seven microphones as the center,such as shown in circles P1, P2, P3, and so on, in the same way as withthe above-mentioned case.

Subsequently, an impulse serving as a sound wave for measurementreproduced with the speaker in each perceived sound source position issound-collected at each of the microphones ML1 through ML7 and MR1through MR7, respectively. Subsequently, in a state in which there is adummy head or person in the listener position, an HRTF is obtained fromeach of the output audio signals of the microphones ML1 through ML7, andMR1 through MR7. Also, in a natured state in which there is neitherdummy head nor person, natured-state transfer properties are obtainedfrom each of the output audio signals of the microphones ML1 throughML7, and MR1 through MR7. Subsequently, as described above, a normalizedHRTF is each obtained from the HRTF and natured-state transferproperties, and is stored in a normalized HRTF memory 40.

In the case of the example in FIG. 5, a normalized HRTF to be convolutedinto an audio signal which each of the microphones supplies to thecorresponding headphone driver unit is obtained from each of the outputaudio signals of the microphones ML1 through ML7, and MR1 through MR7 atthe time of localizing a virtual sound image in each perceived soundsource direction position.

From the above, impulse responses from a virtual sound source positionare measured in an anechoic chamber, for example, at 10 degreeintervals, centered on the center position of the head of the listeneror the center position of the electro-acoustic conversion unit forsupplying audio to the listener at the time of reproduction, as shown inFIGS. 2A through 5, so HRTFs can be obtained regarding only a directwave from the respective virtual sound image localization positions,with reflected waves having been eliminated.

The obtained normalized HRTFs have properties of speakers generating theimpulses and properties of the microphones picking up the impulseseliminated by normalization processing.

Further, the obtained normalized HRTFs have had a delay removed whichcorresponds to the distance between the position of speaker generatingthe impulses (perceived sound source position) and position ofmicrophones for picking up the impulses (assumed driver positions), sothis is irrelevant to the distance between the position of speakergenerating the impulses (perceived sound source position) and positionof microphones for picking up the impulses (assumed driver positions).That is to say, the obtained normalized HRTFs are HRTFs corresponding toonly the direction of the speaker generating the impulses (perceivedsound source position) as viewed from the position of microphones forpicking up the impulses (assumed driver positions).

Accordingly, at the time of convolution of the normalized HRTF in theaudio signals, providing a delay to the audio signals corresponding tothe distance between the virtual sound source position and the assumeddriver position enables acoustic reproduction with the distance positioncorresponding to the delay in the direction of the perceived soundsource position as to the assumed driver positions as a virtual soundimage localization position. With reflected waves from the direction ofthe perceived sound source position, this can be achieved by providingthe audio signals with a delay corresponding to the path length of soundwaves from the position at which virtual sound image localization isdesired, reflected off of reflection portions such as walls or the like,and input to the assumed driver position from the perceived sound sourceposition.

That is to say, in the case of convoluting a normalized HRTF into anaudio signal regarding a direct wave and reflected waves, the audiosignal is subjected to delay corresponding to the path length of a soundwave to be input from a desired virtual sound image localizationposition to a perceived driver position.

Note that signal processing in the block diagram in FIG. 1 fordescribing an embodiment of the HRTF measurement method can be allperformed by a DSP (Digital Signal Processor). In this case, theobtaining units of the HRTF data X(m) and natural-state transferproperty data Xref(m) of the HRTF measurement unit 10 and natural-statetransfer property measurement unit 20, the delay removal shift-up units31 and 32, the FFT units 33 and 34, the polar coordinates conversionunits 35 and 36, the normalization and X-Y coordinates conversion unit37, the inverse FFT unit 38, and the IR simplification unit 39, can eachbe configured a DSP, or the entire signal processing can be configuredof a single or multiple DSPs.

Note that with the example in FIG. 1 described above, data of HRTFs andnatural-state transfer properties is subjected to removal of head dataof an amount of delay time corresponding to the distance between theperceived sound source position and the microphone position at the delayremoval shift-up units 31 and 32, in order to reduce the amount ofprocessing regarding later-described convolution for the HRTFs, wherebydata following that removed is shifted up to the head, and this dataremoval processing is performed using memory within the DSP, forexample. However, in cases wherein this delay-removal shift-up can bedone without, the DSP may perform processing of the original data withthe unaltered 8,192 samples of data.

Also, the IR simplification unit 39 is for reducing the amount ofconvolution processing at the time of the later-described convolutionprocessing of the HRTFs, and accordingly this can be omitted.

Further, in the above-described embodiment, the reason that thefrequency-axial data of the X-Y coordinate system from the FFT units 33and 34 is converted into frequency data of a polar coordinate system istaking into consideration cases where normalization processing does notwork in the state of frequency data of the X-Y coordinate system, sowith an ideal configuration, normalization processing can be performedwith frequency data of the X-Y coordinate system as it is.

Note that with the above-described example, normalized HRTFs areobtained regarding a great number of perceived sound source positions,assuming various virtual sound image localization positions and theperceived driver positions of the incident directions of the reflectedwaves thereof. The reason why normalized HRTFs regarding the multipleperceived sound source positions have been thus obtained is for enablingan HRTF in the direction of an employed perceived sound source positionto be selected therefrom later. However, it goes without saying that ina case wherein a virtual sound source localization position is fixedbeforehand, and the incident direction of a reflected wave is determinedbeforehand, normalized HRTFs as to the fixed virtual sound imagelocalization position and the perceived sound source position in theincident direction of a reflected wave may be obtained.

Now, while measurement is performed in an anechoic chamber in theabove-described embodiment in order to measure the HRTFs andnatural-state transfer properties regarding only the direct waves frommultiple perceived sound source positions, but direct wave componentscan be extracted even in rooms with reflected waves rather than ananechoic chamber, if the reflected waves are greatly delayed as to thedirect waves, by applying a time window to the direct wave components.

Also, by using TSP (Time Stretched Pulse) signals instead of impulsesfor the measurement sound waves for HRTFs emitted by the speaker at theperceived sound source positions, reflected waves can be eliminated andHRTFs and natural-state transfer properties can be measured regardingdirect waves alone even if not in an anechoic chamber.

Verification of Advantages of Employing Normalized HRTF

FIGS. 6A and 6B show properties of a measurement system includingspeakers and microphones actually used for HRTFs measurement. FIG. 6Aillustrates frequency properties of output signals from the microphoneswhen sound of frequency signals from 0 to 20 kHz is reproduced at a sameconstant level by the speaker in a state where an obstacle such as thedummy head or human is not inserted, and picked up with the microphones.

The speaker used here is an industrial-use speaker which is supposed tohave quite good properties, but even then properties as shown in FIG. 6Aare exhibited, and flat frequency properties are not obtained. Actually,the properties shown in FIG. 6A are recognized as being excellentproperties, belonging to a fairly flat class of general speakers.

With the related art, the properties of the speaker and microphones areadded to the HRTF, and are not removed, so the properties and soundquality of the sound obtained with the HRTFs convoluted are effected ofthe properties of the speaker of and microphones.

FIG. 6B illustrates frequency properties of output signals from themicrophones in a state that an obstacle such as a dummy head or human isinserted under the same conditions. It can be sent that there is a greatdip near 1200 Hz and near 10 kHz, illustrating that the frequencyproperties change greatly.

FIG. 7A is a frequency property diagram illustrating the frequencyproperties of FIG. 6A and the frequency properties of FIG. 6B overlaid.On the other hand, FIG. 7B illustrates normalized HRTF propertiesaccording to the embodiment described above. It can be sent form thisFIG. 7B that gain does not drop with the normalized HRTF properties,even in the lowband.

With the embodiment according to the present invention described above,complex FFT processing is performed, and normalized HRTFs are usedtaking into consideration the phase component, so the normalized HRTFsare higher in fidelity as compared to cases of using HRTFs normalizedonly with the amplitude component.

An arrangement wherein processing for normalizing the amplitude alonewithout taking into consideration the phase is performed, and theimpulse properties remaining at the end are subjected to FFT again toobtain properties, is shown in FIG. 8. As can be understood by comparingthis FIG. 8 with FIG. 7B which is the properties of the normalized HRTFaccording to the present embodiment, the difference in property betweenthe HRTF X(m) and natural-state transfer property Xref(m) is correctlyobtained with the complex FFT as shown in FIG. 7B, but in a case of nottaking the phase into consideration, this deviates from what it shouldbe, as shown in FIG. 8.

Also, in the processing procedures in FIG. 1 described above, the IRsimplification unit 39 performs simplification of the normalized HRTFsat the end, so deviation of properties is less as compared to a casewhere the number of data is reduced from the beginning.

That is to say, in the event of performing simplification for reducingthe number of data first for the data obtained with the HRTF measurementunit 10 and natural-state transfer property measurement unit 20 (case ofperforming normalization with those following the number of impulsesused at the end as 0), the properties of the normalized HRTFs are asshown in FIG. 9, with particular deviation in lowband properties. On theother hand, the properties of the normalized HRTFs obtained with theconfiguration of the embodiment described above are as shown in FIG. 7B,with little deviation even in lowband properties.

Description of HRTF Convolution Method

FIG. 10 illustrates an impulse response serving as an example of an HRTFobtained by a measurement method according to the related art, which isan integral response including a direct wave as well as all of thereflected wave components. Heretofore, as shown in FIG. 10, the entiretyof an integral impulse response including a direct wave and all of thereflected waves is convoluted into an audio signal within oneconvolution process section.

The reflected waves include a high-order reflected wave, and alsoinclude a reflected wave of which the path length from a virtual soundimage localization position to a measurement point position is long, andaccordingly, a convolution process section according to the related artbecomes a relatively long section such as shown in FIG. 10. Note thatthe top section DLO within the convolution process section indicatesdelay worth equivalent to time spent for a direct wave from a virtualsound image localization position reaching a measurement point position.

As compared to the HRTF convolution method according to the related artsuch as in FIG. 10, with the present embodiment, a normalized HRTF for adirect wave obtained as described above, and selected normalized HRTFare convoluted into an audio signal.

Basically, with the present embodiment, when determining a virtual soundimage localization position, a normalized HRTF for a direct wave betweenthe virtual sound image localization position and a measurement pointposition (acoustic reproduction driver installation position) isconvoluted into an audio signal. Note however, with regard to normalizedHRTFs for reflected waves, only an HRTF selected according to aperceived listening environment, room configuration, or the like isconvoluted into an audio signal.

For example, in the case of perceiving a listening environment such asthe above-mentioned vast plain, only a reflected wave from a virtualsound image localization position to the ground surface (floor) isselected of reflected waves, a normalized HRTF obtained in a directionwhere the relevant reflected wave is input to the measurement pointposition is convoluted into an audio signal. Also, for example, in thecase of a common rectangular parallelepiped shaped room, all of thereflected waves from a ceiling, floor, walls on the left and right ofthe listener, and walls of the forward and backward of the listener areselected, normalized HRTFs obtained in directions where these reflectedwaves are input to measurement point positions are convoluted.

Also, in the case of the latter room, a secondary reflection, thirdreflection, and so forth as well as a primary reflection are caused asreflected waves, but for example, a primary reflection alone isselected. According to an experiment, even with an audio signal in whicha normalized HRTF regarding a primary reflection is convoluted, theaudio signal thereof is reproduced acoustically, thereby obtainingexcellent virtual sound image localization feeling. Note that ifnormalized HRTFs regarding a second reflected wave and thereafter areconvoluted into an audio signal, when the audio signal thereof isreproduced acoustically, further excellent virtual sound imagelocalization feeling are obtained in some cases.

A normalized HRTF regarding a direct wave is basically convoluted intoan audio signal without changing the gain thereof, but with regard toreflected waves, a normalized HRTF is convoluted into an audio signalwith gain corresponding to whether the reflected wave is primaryreflection or second reflection or further high-order reflection. Thisis because normalized HRTFs obtained with the present embodiment areeach measured regarding a direct wave from a perceived sound sourceposition set in a predetermined direction, and normalized HRTF regardingreflected waves in the relevant predetermined directions are attenuatedas to the direct wave. Note that the higher the order of a reflectedwave is, the more the attenuation amount of a normalized HRTF regardingthe reflected wave as to a direct wave increases.

Also, as described above, with regard to HRTFs of reflected waves, thepresent embodiment enables gain to be set further in light of the degreeof sound absorption (attenuation rate of a sound wave) corresponding tothe surface shape, surface configuration, material, or the like of aperceived reflection portion.

As described above, with the present embodiment, a reflected wave forconvoluting an HRTF is selected, and the gain of the HRTF of eachreflected wave is adjusted, whereby convolution of an HRTF as to anaudio signal can be performed according to an arbitrary perceived roomenvironment and listening environment. That is to say, like the relatedart, an HRTF with a room or space perceived to provide an excellentacoustic field space can be convoluted into an audio signal withoutmeasuring an HRTF with a room or space which provides an excellentacoustic field.

First example of Convolution Method (FIGS. 11 and 12)

With the present embodiment, a normalized HRTF for a direct wave (directwave direction HRTF), and a normalized HRTF for each of reflected waves(reflected wave direction HRTF) are, as described above, obtainedindependently, and accordingly, with a first example, HRTFs for a directwave and each of reflected waves are convoluted into an audio signalindependently.

For example, a case will be described wherein three reflected waves(reflected wave directions) as well as a direct wave (direct wavedirection) are selected, normalized HRTFs corresponding to both (directwave direction HRTF and reflected wave direction HRTF) are convoluted.

Delay time corresponding to the path length from a virtual sound imagelocalization position to a measurement point position is obtained as toeach of a direct wave and reflected waves beforehand. This delay time isobtained by a calculation if a measurement point position (acousticreproduction driver position) and virtual sound image localizationposition are determined, and a reflection portion is determined.Subsequently, with regard to the reflected waves, the attenuation amount(gain) as to a normalized HRTF is also determined beforehand.

FIG. 11 illustrates an example of delay time, gain, and furtherconvolution processing sections regarding a direct wave and threereflected waves. With the example in FIG. 11, with regard to anormalized HRTF for a direct wave (direct wave direction HRTF), delayDL0 equivalent to time spent for the direct wave reaching a measurementpoint position from a virtual sound image localization position is takeninto consideration as to an audio signal. That is to say, a convolutionstart point of the normalized HRTF for the direct wave becomes a pointin time t0 obtained by delaying the audio signal by the above-mentioneddelay DL0, such as shown at the bottom of FIG. 11.

Subsequently, the normalized HRTF regarding the direction of therelevant direct wave obtained as described above is convoluted into theaudio signal at a convolution process section CP0 of data length worthof the relevant normalized HRTF (600 pieces worth of data in the aboveexample) which is started from the above-mentioned point in time t0.

Next, of the three reflected waves, with regard to the normalized HRTFof a first reflected wave 1 (reflected wave direction HRTF), delay DL1corresponding to a path length where the first reflected wave reaches ameasurement point position from a virtual sound image localizationposition is taken into consideration as to the audio signal. That is tosay, a convolution start point of the normalized HRTF for the firstreflected wave 1 becomes a point in time t1 obtained by delaying theaudio signal by the delay DL1, which is shown at the bottom of FIG. 11.

Subsequently, the normalized HRTF regarding the direction of the firstreflected wave 1 obtained as described above (reflected wave directionHRTF) is convoluted into the audio signal at a convolution processsection CP1 of data length worth of the relevant normalized HRTF (600pieces worth of data in the above example) which is started from theabove-mentioned point in time t1. At the time of this convolutionprocessing, the above-mentioned normalized HRTF is multiplied by gain G1(G1<1) in light of what order the first reflected wave 1 is, and thedegree of sound absorption (or the degree of reflection) at a reflectionportion.

Also, similarly, with regard to the normalized HRTFs of a secondreflected wave 2 and third reflected wave 3 (reflected wave directionHRTFs), delay DL2 and DL3 corresponding to a path length where the firstreflected wave and third reflected wave reach a measurement pointposition from a virtual sound image localization position is taken intoconsideration as to the audio signal. That is to say, as shown at thebottom of FIG. 11, a convolution start point of the normalized HRTF forthe second reflected wave 2 becomes a point in time t2 obtained bydelaying the audio signal by the delay DL2, and a convolution startpoint of the normalized HRTF for the third reflected wave 3 becomes apoint in time t3 obtained by delaying the audio signal by the delay DL3.

Subsequently, the normalized HRTF regarding the direction of the secondreflected wave 2 obtained as described above (reflected wave directionHRTF) is convoluted into the audio signal at a convolution processsection CP2 of data length worth of the relevant normalized HRTF (600pieces worth of data in the above example) which is started from theabove-mentioned point in time t2, and the normalized HRTF regarding thedirection of the third reflected wave 3 obtained as described above(reflected wave direction HRTF) is convoluted into the audio signal at aconvolution process section CP3 of data length worth of the relevantnormalized HRTF (600 pieces worth of data in the above example) which isstarted from the above-mentioned point in time t3.

At the time of this convolution processing, the above-mentionednormalized HRTFs are multiplied by gain G2 and G3 (G2<1 and G3<1) inlight of what order each of the second reflected wave 2 and thirdreflected wave 3 is, and the degree of sound absorption (or the degreeof reflection) at a reflection portion.

FIG. 12 illustrates a hardware configuration example of a normalizedHRTF convolution unit configured to execute the convolution processingof the example in FIG. 11 described above.

The example in FIG. 12 is configured of a convolution processing unit 51for a direct wave, convolution processing units 52, 53, and 54 for thefirst through third reflected waves 1, 2, and 3, and adder 55. Each ofthe convolution processing units 51 through 54 has the completely sameconfiguration. With this example, the convolution processing units 51through 54 are configured of delay units 511, 521, 531, and 541, HRTFconvolution circuits 512, 522, 532, and 542, normalized HRTF memory 513,523, 533, and 543, gain adjustment units 514, 524, 534, and 544, andgain memory 515, 525, 535, and 545, respectively.

With this example, an input audio signal Si into which an HRTF should beconvoluted is supplied to each of the delay units 511, 521, 531, and541. The delay units 511, 521, 531, and 541 delay the input audio signalSi into which an HRTF should be convoluted to conversion start points intime t0, t1, t2, and t3 of the normalized HRTFs for the direct wave andfirst through third reflected waves, respectively. Accordingly, withthis example, as shown in the drawing, the delay amounts of the delayunits 511, 521, 531, and 541 are determined as DL0, DL1, DL2, and DL3,respectively.

Each of the HRTF conversion circuits 512, 522, 532, and 542 is a portionto execute processing for convoluting a normalized HRTF into an audiosignal, and with this example, configured of an IIR (Infinite ImpulseResponse) filter or FIR (Finite Impulse Response) filter, of 600 taps.

The normalized HRTF memory 513, 523, 533, and 543 are for storing andholding a normalized HRTF to be convoluted at each of the HRTFconvolution circuits 512, 522, 532, and 542. The normalized HRTF memory513 stores and holds a normalized HRTF regarding the direction of adirect wave, the normalized HRTF memory 523 stores and holds anormalized HRTF regarding the direction of the first reflected wave, thenormalized HRTF memory 533 stores and holds a normalized HRTF regardingthe direction of the second reflected wave, and the normalized HRTFmemory 543 stores and holds a normalized HRTF regarding the direction ofthe third reflected wave, respectively.

The stored and held normalized HRTF regarding the direction of a directwave, the stored and held normalized HRTF regarding the direction of thefirst reflected wave, the stored and held normalized HRTF regarding thedirection of the second reflected wave, and the stored and heldnormalized HRTF regarding the direction of the third reflected wave are,for example, selected and read out from the above-mentioned normalizedHRTF memory 41, and are written in the corresponding normalized HRTFmemory 513, 523, 533, and 543, respectively.

The gain adjustment units 514, 524, 534, and 544 are for adjusting thegain of a normalized HRTF to be convoluted. The gain adjustment units514, 524, 534, and 544 multiply the normalized HRTFs from the normalizedHRTF memory 513, 523, 533, and 543 by the gain values (<1) stored in thegain memory 515, 525, 535, and 545, and supply the multiplicationresults to the HRTF convolution circuits 512, 522, 532, and 542,respectively.

With this example, the gain value G0 (≦1) regarding a direct wave isstored in the gain memory 515, the gain value G1 (<1) regarding thefirst reflected wave is stored in the gain memory 525 the gain value G2(<1) regarding the second reflected wave is stored in the gain memory535, and the gain value G3 (<1) regarding the third reflected wave isstored in the gain memory 545.

The adder 55 adds and composites the audio signals into which thenormalized HRTFs from the convolution processing unit 51 for a directwave, and the convolution processing units 52, 53, and 54 for the firstthrough third reflected waves have been convoluted, and outputs anoutput audio signal So.

With such a configuration, an input audio signal Si into which an HRTFshould be convoluted is supplied to each of the delay units 511, 521,531, and 541, and the respective input audio signals Si are delayed tothe convolution start points in time t0, t1, t2, and t3 of thenormalized HRTFs for the direct wave and first through third reflectedwaves. The input audio signals Si delayed to the convolution startpoints in time t0, t1, t2, and t3 of the HRTFs at the delay units 511,521, 531, and 541 are supplied to the HRTF convolution circuits 512,522, 532, and 542.

On the other hand, the stored and held normalized HRTF data is read outsequentially from each of the convolution start points in time t0, t1,t2, and t3 from each of the normalized HRTF memory 513, 523, 533, and543. The readout timing control of the normalized HRTF data from each ofthe normalized HRTF memory 513, 523, 533, and 543 will be omitted here.

The readout normalized HRTF data is subjected to gain adjustment bybeing multiplied by the gain G0, G1, G2, and G3 from the gain memory515, 525, 535, and 545 at each of the gain adjustment units 514, 524,534, and 544, following which is supplied to each of the HRTFconvolution circuits 512, 522, 532, and 542.

With each of the HRTF convolution circuits 512, 522, 532, and 542, thegain-adjusted normalized HRTF data is subjected to convolutionprocessing at each of the convolution process sections CP0, CP1, CP2,and CP3 shown in FIG. 11. Subsequently, the convolution processingresults at each of the HRTF convolution circuits 512, 522, 532, and 542is added at the adder 55, and the addition results are output as anoutput audio signal So.

In the case of the first example, each of the normalized HRTFs regardinga direct wave and multiple reflected waves can be convoluted into anaudio signal independently, so the delay amounts at the delay units 511,521, 531, and 541, and gain stored in the gain memory 515, 525, 535, and545 are adjusted, and further, the normalized HRTFs to be stored in thenormalized HRTF memory 513, 523, 533, and 543 and convoluted arechanged, whereby convolution of HRTFs can be readily performed accordingto the difference of an listening environment, such as the difference oflistening environment space types such as indoor, outdoor, or the like,the difference of the shape and size of a room, and the material of areflection portion (the degree of sound absorption and degree ofreflection), and so forth.

In a case wherein the delay units 511, 521, 531, and 541 are configuredof a variable delay unit capable of varying a delay amount according toexternal operation input such as an operator or the like, a unit forwriting an arbitrary normalized HRTF selected from the normalized HRTFmemory 40 by the operator in the normalized HRTF memory 513, 523, 533,and 543, and further, and a unit for allowing the operator to input andstore arbitrary gain in the gain memory 515, 525, 535, and 545 areprovided, convolution of an HRTF can be performed according to alistening environment such as listening environment space setarbitrarily by the operator, room environment, or the like.

For example, in a listening environment having the completely same roomshape, gain can be readily changed according to the material of a wall(the degree of sound absorption and degree of reflection), and a virtualsound image localization state can be simulated according to a situationwherein the material of a wall is changed variously.

Note that, with the arrangement of the example in FIG. 11, instead ofproviding the normalized HRTF memory 513, 523, 533, and 543 as to theconvolution processing unit 51 for a direct wave, and the convolutionprocessing units 52, 53, and 54 for the first through third reflectedwaves respectively, an arrangement may be made wherein the normalizedHRTF memory 40 is provided, which is common to the convolutionprocessing units 51 through 54, and a unit configured to selectivelyread out an HRTF employed by each of the convolution processing units 51through 54 from the normalized HRTF memory 40 is provided in each of theconvolution processing units 51 through 54.

Note that the above-mentioned first example is description regarding thecase wherein in addition to a direct wave, three reflected waves areselected, and these normalized HRTFs are convoluted into an audiosignal, but in a case wherein there are three or more normalized HRTFsregarding reflected waves to be selected, with the configuration in FIG.12, the same convolution processing units as the convolution processingunits 52, 53, and 54 for reflected waves are provided as appropriate,convolution of these normalized HRTFs can be performed completely in thesame way.

Note that, with the example in FIG. 11, an arrangement is made whereinthe delay units 511, 521, 531, and 541 each delay the input signal Siuntil a convolution start point in time, so the respective delay amountsare set to DL0, DL1, DL2, and DL3. However, if an arrangement is madewherein the output end of the delay unit 511 is connected to the inputend of the delay unit 521, the output end of the delay unit 521 isconnected to the input end of the delay unit 531, and the output end ofthe delay unit 531 is connected to the input end of the delay unit 541,whereby the delay amounts at the delay units 521, 532, and 542 can beset to DL1-DL0, DL2-DL1, and DL3-DL2, and accordingly, can be reduced.

Also, in a case wherein the convolution process sections CP0, CP1, CP2,and CP3 are not overlapped mutually, the delay circuits and convolutioncircuits may be connected in serial while taking the time lengths of theconvolution process sections CP0, CP1, CP2, and CP3 into consideration.In this case, if we say that the time lengths of the convolution processsections CP0, CP1, CP2, and CP3 are TP0, TP1, TP2, and TP3, the delayamounts at the delay units 521, 532, and 542 can be regarded asDL1-DL0-TP0, DL2-DL1-TP1, and DL3-DL2-TP2, and accordingly, further canbe reduced.

Second Example of Convolution Method (Coefficient Composite Processing,FIGS. 13 and 14)

This second example is employed in a case wherein an HRTF regarding apredetermined listening environment is convoluted. That is to say, in acase wherein a listening environment is determined beforehand, such asthe type of listening environment space, the shape and size of a room,the material of a reflection portion (the degree of sound absorption anddegree of reflection), or the like, the convolution start points in timeof the normalized HRTFs regarding a direct wave and selected reflectedwave are determined beforehand, and the attenuation amount (gain) at thetime of convoluting each of the normalized HRTFs is also determinedbeforehand.

For example, HRTFs regarding a direct wave and three reflected waves aretaken as an example, as shown in FIG. 13, the convolution start pointsin time of the normalized HRTFs for a direct wave and first throughthird reflected waves become the above-mentioned start points in timet0, t1, t2, and t3, and the delay amounts as to the audio signal becomeDL0, DL1, DL2, and DL3, respectively. Subsequently, the gain at the timeof convolution of the normalized HRTFs regarding a direct wave and firstthrough third can be determined as G0, G1, G2, and G3, respectively.

Therefore, with the second example, as shown in FIG. 13, thosenormalized HRTFs are composited in a time-oriented manner to generate acomposite normalized HRTF, and a convolution process section is set to aperiod until convolution of the multiple normalized HRTFs as to an audiosignal is completed.

Here, as shown in FIG. 13, the substantial convolution sections of therespective normalized HRTFs are CP0, CP1, CP2, and CP3, and there is noHRTF data in sections other than the convolution sections CP0, CP1, CP2,and CP3, and accordingly, data zero is employed as an HRTF in suchsections.

In the case of the second example, a hardware configuration example of anormalized HRTF convolution unit is shown in FIG. 14. Specifically, withthe second example, an input audio signal Si into which an HRTF shouldbe convoluted is delayed at a delay unit 61 regarding an HRTF for adirect wave by a predetermined delay amount regarding the direct wave,following which is supplied to an HRTF convolution circuit 62.

A composite normalized HRTF from composite normalized HRTF memory 63 issupplied to the HRTF convolution circuit 62, and is convoluted into anaudio signal. The composite normalized HRTF stored in the compositenormalized HRTF memory 63 is the composite normalized HRTF describedwith reference to FIG. 13.

The second example involves rewriting of all of the composite normalizedHRTFs even in the case of changing a delay amount, gain, or the like,but as shown in FIG. 14, includes an advantage wherein the hardwareconfiguration of a circuit for convoluting an HRTF can be simplified.

Other Examples of Convolution Method

With both of the above-mentioned first and second examples, a normalizedHRTF regarding the corresponding direction measured beforehand isconvoluted into an audio signal at each of the convolution processsections CP0, CP1, CP2, and CP3, regarding a direct wave and selectedreflected waves.

Note however, the convolution start points in time of HRTFs regardingselected reflected waves, and the convolution process sections CP1, CP2,and CP3 have importance, and accordingly, a signal to be convolutedactually may not be the corresponding HRTF.

Specifically, for example, with the above-mentioned first and secondexamples, at the convolution process section CP0 for a direct wave anormalized HRTF regarding a direct wave (direct wave direction HRTF) isconvoluted, but at the convolution process sections CP1, CP2, and CP3for reflected waves HRTFs attenuated by multiplying the same direct wavedirection HRTF as the convolution process section CP0 by employed gainG1, G2, and G3 may be convoluted in a simplified manner, respectively.

Specifically, in the case of the first example, the same normalized HRTFregarding a direct wave as that in the normalized HRTF memory 513 isstored in the normalized HRTF memory 523, 533, and 543 beforehand.Alternatively, an arrangement may be made wherein the normalized HRTFmemory 523, 533, and 534 are omitted, and only the normalized HRTFmemory 513 is provided, the normalized HRTF for a direct wave is readout from the relevant normalized HRTF memory 513 to supply this to thegain adjustment units 524, 534, and 544 as well as the gain adjustmentunit 514 at each of the convolution process sections CP1, CP2, and CP3.

Further, similarly, with the above-mentioned first and second examples,at the convolution process section CP0 for a direct wave a normalizedHRTF regarding a direct wave (direct wave direction HRTF) is convoluted,but at the convolution process sections CP1, CP2, and CP3 for reflectedwaves an audio signal obtained by delaying an audio signal serving as aconvolution target by the corresponding delay amounts DL1, DL2, and DL3may be convoluted in a simplified manner, respectively. Specifically,holding units are provided, which are configured to hold an audio signalserving as a convolution target by the above-mentioned delay amountsDL1, DL2, and DL3 respectively, and the audio signals held at theholding units are convoluted at the convolution process sections CP1,CP2, and CP3 for reflected waves, respectively.

Example of Acoustic Reproduction System Employing HRTF ConvolutionMethod (FIGS. 16 through 18)

Next, an HRTF convolution method according to an embodiment of thepresent invention will be described with reference to an example ofapplication to a reproduction device capable of reproduction usingvirtual sound image localization, by applying the present embodiment toa case wherein a multi-surround audio signal is reproduced by employingheadphones.

An example described below is a case wherein the placements of 7.1channel multi-surround speakers conforming to ITU (InternationalTelecommunication Union)-R are assumed, and an HRTF is convoluted suchthat the audio components of each channel are subjected to virtual soundimage localization on the disposed positions of the 7.1 channelmulti-surround speakers.

FIG. 15 illustrates an example of the placements of 7.1 channelmulti-surround speakers conforming to ITU-R, wherein the speaker of eachchannel is disposed on the circumference with a listener position Pn asthe center.

In FIG. 15, C which is the front position of a listener is a speakerposition of the center channel. With the speaker position C of thecenter channel as the center, LF and RF which are positions apartmutually by a 60-degree angle range on the both sides thereof indicate aleft front channel and right front channel, respectively.

Subsequently, in a range of 60 degrees through 150 degrees on the leftand right of the front position C of the listener, a pair of speakerpositions LS and LB, and a pair of speaker positions RS and RB are seton the left side and right side. These speaker positions LS and LB, andRS and RB are to be set in symmetrical positions as to the listener. Thespeaker positions LS and RS are speaker positions of a left lateralchannel and right lateral channel, and the speaker positions LB and RBare speaker positions of a left rear channel and right rear channel.

With this acoustic reproduction system example, over-head headphones areemployed wherein seven headphone drivers each are disposed as to each ofboth ears described above with reference to FIG. 5.

Accordingly, with this example, as shown in the above FIG. 5, in each ofthe horizontal direction and vertical direction as to the listener, agreat number of perceived sound source positions are determined with apredetermined resolution, for example, such as for each 10-degree angleinterval, and with regard to each of the great number of perceived soundsource positions thereof, a normalized HRTF regarding each of the sevenheadphone drivers each is obtained.

Subsequently, when a 7.1 channel multi-surround audio signals arereproduced acoustically with the over-head headphones of the presentexample, a selected normalized HRTF is convoluted into the audio signalof each channel of the 7.1 channel multi-surround audio signals suchthat the 7.1 channel multi-surround audio signals are reproducedacoustically with the direction of each of the speaker positions C, LF,RF, LS, RS, LB, and RB in FIG. 15 as a vertical sound image localizationdirection.

FIGS. 16 and 17 illustrate a hardware configuration example of theacoustic reproduction system. The reason why the drawing is divided intoFIGS. 16 and 17 is because it is difficult to illustrate the acousticreproduction system of the present example within one paper space as amatter of convenience of the size of paper, so the continuation of FIG.16 is FIG. 17.

Note that in FIGS. 16 and 17, the audio signal of each channel to besupplied to the speaker positions C, LF, RF, LS, RS, LB, and RB in FIG.15 are denoted with the same symbols C, LF, RF, LS, RS, LB, and RB.Here, in FIGS. 16 and 17, an LFE (Low Frequency Effect) channel is alow-pass effect channel, this is audio of which the sound imagelocalization direction is not determined, and accordingly, with thisexample, this channel is an audio channel not employed as a convolutiontarget of an HRTF.

As shown in FIG. 16, the 7.1 channel signals, i.e., audio signals ofeight channels of LF, LS, RF, RS, LB, RB, C, and LFE are supplied to A/Dconverters 73LF, 73LS, 73RF, 73RS, 73LB, 73RB, 73C, and 73LFE throughlevel adjustment units 71LF, 71LS, 71RF, 71RS, 71LB, 71RB, 71C, and71LFE, and amplifiers 72LF, 72LS, 72RF, 72RS, 72LB, 72RB, 72C, and72LFE, and are converted into digital audio signals, respectively.

As shown in FIG. 17, with the present example, seven headphone drivers90L1, 90L2, 90L3, 90L4, 90L5, 90L6, and 90L7 for the left ear areemployed as for a crosstalk channel xRF of the right front channel, forthe left lateral channel LS, for the left front channel LF, for the leftrear channel LB, for the center channel C, for the low-pass effectchannel LFE, and for a crosstalk channel xRS of the right lateralchannel, respectively.

Also, seven headphone drivers 90R1, 90R2, 90R3, 90R4, 90R5, 90R6, and90R7 for the right ear are employed as for a crosstalk channel xLF ofthe left lateral channel, for the right lateral channel RS, for theright front channel RF, for the right rear channel RB, for the centerchannel C, for the low-pass effect channel LFE, and for a crosstalkchannel xLS of the left lateral channel, respectively.

With the present example, an arrangement is made wherein the audiosignal for the center channel C, and the audio signal for the low-passeffect channel LFE are generated in common and supplied to the left andright headphone drivers 90L5 and 90R5, and headphone drivers 90L6 and90R6, respectively. As described above, with the acoustic reproductionsystem shown in FIGS. 16 and 17, 12 channels worth are generated asaudio signals to be supplied to the respective headphone drivers forboth ears of the over-head headphones.

As shown in FIG. 16, with the present example, 12 channels worth of HRTFconvolution processing units 74 xRF, 74LS, 74LF, 74LB, 74 xRS, 74LFE,74C, 74 xLS, 74RB, 74RF, 74RS, and 74 xLF are provided.

The HRTF convolution processing unit 74 xRF is for the crosstalk channelxRF of the right front channel, HRTF convolution processing unit 74LS isfor the left lateral channel LS, HRTF convolution processing unit 74LFis for the left front channel LF, HRTF convolution processing unit 74LBis for the left rear channel LB, HRTF convolution processing unit 74 xRSis for the crosstalk channel xRS of the right lateral channel, HRTFconvolution processing unit 74LFE is for the low-pass effect channelLFE, HRTF convolution processing unit 74C is for the center channel C,HRTF convolution processing unit 74 xLS is for the crosstalk channel xLSof the left lateral channel, HRTF convolution processing unit 74RB isfor the right rear channel RB, HRTF convolution processing unit 74RF isfor the right front channel RF, HRTF convolution processing unit 74RS isfor the right lateral channel RS, and HRTF convolution processing unit74 xLF is for the crosstalk channel xLF of the left lateral channel.

With the present example, the HRTF convolution processing units 74 xRF,74LS, 74LF, 74LB, 74 xRS, 74LFE, 74C, 74 xLS, 74RB, 74RF, 74RS, and 74xLF have the same hardware configuration such as shown in FIG. 18.

In the case of the present example, as shown in FIG. 5, with regard to asound wave for measurement from one perceived sound source positiondirection, an HRTF is measured at each of the seven microphonescorresponding to the seven headphone drivers, and is each normalized asdescribed above, thereby obtaining seven normalized HRTFs. Subsequently,the obtained seven normalized HRTFs are convoluted into seven audiosignals to be supplied to the headphone drivers corresponding to themicrophones for measurement, respectively.

Therefore, the HRTF convolution processing units 74 xRF, 74LS, 74LF,74LB, 74 xRS, 74LFE, 74C, 74 xLS, 74RB, 74RF, 74RS, and 74 xLF are, asshown in FIG. 18, configured of seven normalized HRTF convolution units101, 102, 103, 104, 105, 106, and 107 regarding the audio signals of theseven channels excluding the LFE channel, and an adder 108 configured toadd the outputs from the seven normalized HRTF convolution units 101through 107, respectively.

Each of the seven normalized HRTF convolution units 101 through 107executes convolution processing of a normalized HRTF as to an inputaudio signal thereof. As the hardware configuration of each of the sevennormalized HRTF convolution units 101 through 107, the hardwareconfiguration of the first example in FIG. 12 may be employed, or thehardware configuration of the second example in FIG. 14 may be employed.

With each of the HRTF convolution processing units 74 xRF, 74LS, 74LF,74LB, 74 xRS, 74LFE, 74C, 74 xLS, 74RB, 74RF, 74RS, and 74 xLF, each ofselected normalized HRTFs to be convoluted (normalized HRTFs regarding adirect wave and reflected waves) to localize a virtual sound image asthe reproduction sound field of the 7.1 channel multi surround isconvoluted.

Note that, with the present example, the HRTF convolution unit 74LFEdoes not perform convolution processing of an HRTF, inputs the audiosignal of the low-pass effect channel, and outputs this without change.

The output audio signals from the HRTF convolution processing units 74xRF, 74LS, 74LF, 74LB, 74 xRS, 74LFE, 74C, 74 xLS, 74RB, 74RF, 74RS, and74 xLF are, as shown in FIG. 17, supplied to D/A converters 76 xRF,76LS, 76LF, 76LB, 76 xRS, 76LFE, 76C, 76 xLS, 76RB, 76RF, 76RS, and 76xLF through level adjustment units 75 xRF, 75LS, 75LF, 75LB, 75 xRS,75LFE, 75C, 75 xLS, 75RB, 75RF, 75RS, and 75 xLF, and are converted intoanalog audio signals, respectively.

The analog audio signals from the D/A converters 76 xRF, 76LS, 76LF,76LB, 76 xRS, 76LFE, 76C, 76 xLS, 76RB, 76RF, 76RS, and 76 xLF aresupplied to current-to-voltage converters 77 xRF, 77LS, 77LF, 77LB, 77xRS, 77LFE, 77C, 77 xLS, 77RB, 77RF, 77RS, and 77 xLF, and are convertedinto voltage signals from the current signals, respectively.

Subsequently, the audio signals converted into voltage signals from thecurrent-to-voltage converters 77 xRF, 77LS, 77LF, 77LB, 77 xRS, 77LFE,77C, 77 xLS, 77RB, 77RF, 77RS, and 77 xLF are subjected to leveladjustment as level adjustment units 78 xRF, 78LS, 78LF, 78LB, 78 xRS,78LFE, 78C, 78 xLS, 78RB, 78RF, 78RS, and 78 xLF, following which aresupplied to gain adjustment units 79 xRF, 79LS, 79LF, 79LB, 79 xRS,79LFE, 79C, 79 xLS, 79RB, 79RF, 79RS, and 79 xLF, and are subjected togain adjustment, respectively.

Subsequently, output audio signals from the gain adjustment units 79xRF, 79LS, 79LF, 79LB, and 79 xRS are supplied to the headphone drivers90L1, 90L2, 90L3, 90L4, and 90L7 for the left ear through amplifiers80L1, 80L2, 80L3, 80L4, and 80L7, respectively.

Also, output audio signals from the gain adjustment units 79LxLS, 79RB,79RF, 79RS, and 79 xLF are supplied to the headphone drivers 90R7, 90R4,90R3, 90R2, and 90R1 for the right ear through amplifiers 80R7, 80R4,80R3, 80R2, and 80R1, respectively.

Also, an output audio signal from the gain adjustment unit 79C issupplied to the headphone driver 90L5 through an amplifier 80L5, and isalso supplied to the headphone driver 90R5 through an amplifier 80R5.Further, an output audio signal from the gain adjustment unit 79LFE issupplied to the headphone driver 90L6 through an amplifier 80L6, and isalso supplied to the headphone driver 90R6 through an amplifier 80R6.

Example of Normalized HRTF Convolution Start Timing with AcousticReproduction System (FIGS. 19 through 27)

Next, description will be made regarding normalized HRTFs to beconvoluted at the HRTF convolution processing units 74 xRF, 74LS, 74LF,74LB, 74 xRS, 74LFE, 74C, 74 xLS, 74RB, 74RF, 74RS, and 74 xLF in FIG.16, and the convolution start timing thereof.

For example, convolution of HRTFs will be described when assuming a roomof a rectangular parallelepiped shape of vertical×horizontal=4550mm×3620 mm, and the reproduction acoustic space of 7.1 channel multisurround conforming to ITU-R wherein the distance between the left frontspeaker position LF and right front speaker position RF is 1600 mm. Notethat, with regard to reflected waves, ceiling reflection and floorreflection will be omitted, and only wall reflection will be describedhere to simplify description.

With the present embodiment, a normalized HRTF regarding a direct wave,normalized HRTF regarding the crosstalk components thereof, normalizedHRTF regarding a primary reflected wave, and normalized HRTF regardingthe crosstalk components thereof will be convoluted.

First, in order to set the right front speaker position RF to a virtualsound image localization position, the directions of sound wavesregarding normalized HRTFs may be employed such as shown in FIG. 19.

Specifically, in FIG. 19, RFd denotes a direct wave from the positionRF, and xRFd denotes crosstalk to the left channel thereof. Note that asymbol x denotes crosstalk. This can be applied to the followingdrawings.

Also, RFsR denotes a reflected wave primarily reflected at the rightside wall from the position RF, and xRFsR denotes crosstalk to the leftchannel thereof. Also, RFfR denotes a reflected wave primarily reflectedat the front wall from the position RF, and xRFfR denotes crosstalk tothe left channel thereof. Also, RFsL denotes a reflected wave primarilyreflected at the left wall from the position RF, and xRFsL denotescrosstalk to the left channel thereof. Further, RFbR denotes a reflectedwave primarily reflected at the rear wall from the position RF, andxRFbR denotes crosstalk to the left channel thereof.

With regard to each of a direct wave and crosstalk thereof, andreflected wave and crosstalk thereof, normalized HRTFs to be convolutedare normalized HRTFs measured regarding directions where those soundwaves have been input to the listener position Pn lastly.

Specifically, normalized HRTFs to be convoluted are seven normalizedHRTFs to be measured corresponding to the seven headphone drivers as toa sound wave in one direction, respectively. Subsequently, each of theseven normalized HRTFs is convoluted into the audio signal of thechannel to be supplied to the corresponding headphone driver.

Subsequently, points in time to start convolution of normalized HRTFs ofthe direct wave RFd and crosstalk xRFd thereof, and reflected wavesRFsR, RFfR, RFsL, and RFbR and crosstalk xRFsR, xRFfR, xRFsL, and xRFbRthereof, as to the audio signal of the right front channel RF arecalculated from the path lengths of the sound waves thereof, and thecalculation results such as shown in FIG. 20 are obtained.

Subsequently, with regard to the gain of a normalized HRTF to beconvoluted, the attenuation amount for a direct wave is set to zero.Also, the attenuation amount for reflected waves is set according to aperceived degree of sound absorption.

Note that FIG. 20 simply illustrates points in time to start convolutionof normalized HRTFs of the direct wave RFd and crosstalk xRFd thereof,and reflected waves RFsR, RFfR, RFsL, and RFbR and crosstalk xRFsR,xRFfR, xRFsL, and xRFbR thereof, as to the audio signal, but does notillustrate the convolution start point of a normalized HRTF to beconvoluted into an audio signal to be supplied to the headphone driverfor one channel.

Specifically, each of the normalized HRTFs of the direct wave RFd andcrosstalk xRFd thereof, and reflected waves RFsR, RFfR, RFsL, and RFbRand crosstalk xRFsR, xRFfR, xRFsL, and xRFbR thereof is convoluted atthe HRTF convolution unit for the channel selected from theabove-mentioned HRTF convolution processing units 74 xRF, 74LS, 74LF,74LB, 74 xRS, 74LFE, 74C, 74 xLS, 74RB, 74RF, 74RS, and 74 xLFbeforehand.

This can be applied to a relation between normalized HRTFs to beconvoluted to set the speaker position of another channel to a virtualsound image localization position, and an audio signal serving as aconvolution target as well as the normalized HRTFs to be convoluted toset the right front speaker position RF to a virtual sound imagelocalization position.

Next, in order to set the left front speaker position LF to a virtualsound image localization position, the directions of sound wavesregarding normalized HRTFs to be convoluted can be taken as thoseobtained by moving the drawing shown in FIG. 19 to the left side in asymmetrical manner. Though these will not be shown in the drawing, adirect wave LFd and crosstalk xLFd thereof, a reflected wave LFsL fromthe left side wall and crosstalk xLFsL thereof, a reflected wave LFfLfrom the front wall and crosstalk xLFfL thereof, a reflected wave LFsRfrom the right side wall and crosstalk xLFsR thereof, and a reflectedwave LFbL from the rear wall and crosstalk xLFbL thereof are obtained.Subsequently, normalized HRTFs to be convoluted are determined accordingto the incident directions of these as to the listener position Pn, andthe convolution start timing points in time thereof are the same asthose shown in FIG. 20.

Also, similarly, in order to set the center speaker position C to avirtual sound image localization position, the directions of sound wavesregarding normalized HRTFs to be convoluted are such as shown in FIG.21.

Specifically, the directions of sound waves regarding normalized HRTFsto be convoluted are a direct wave Cd, a reflected wave CsR from theright side wall and crosstalk xCsR thereof, and a reflected wave CbRfrom the rear wall. Only the reflected wave on the right side isillustrated in FIG. 21, but the left side can also be set similarly,i.e., a reflected wave CsL from the left side wall and crosstalk xCsLthereof, and a reflected wave CbL from the rear wall.

Subsequently, normalized HRTFs to be convoluted are determined accordingto the incident directions of the direct wave and reflected wave, andcrosstalk thereof as to the listener position Pn, and the convolutionstart timing points in time thereof are the same as those shown in FIG.22.

Next, in order to set the right lateral speaker position RS to a virtualsound image localization position, the directions of sound wavesregarding normalized HRTFs to be convoluted are such as shown in FIG.23.

Specifically, a direct wave RSd and crosstalk xRSd thereof, a reflectedwave RSsR from the right side wall and crosstalk xRSsR thereof, areflected wave RSfR from the front wall and crosstalk xRSfR thereof, areflected wave RSsL from the left side wall and crosstalk xRSsL thereof,and a reflected wave RSbR from the rear wall and crosstalk xRSbR thereofare obtained. Subsequently, normalized HRTFs to be convoluted aredetermined according to the incident directions of these as to thelistener position Pn, and the convolution start timing points in timethereof are the same as those shown in FIG. 24.

In order to set the left lateral speaker position LS to a virtual soundimage localization position, the directions of sound waves regardingnormalized HRTFs to be convoluted can be taken as those obtained bymoving the drawing shown in FIG. 23 to the left side in a symmetricalmanner. Though these will not be shown in the drawing, a direct wave LSdand crosstalk xLSd thereof, a reflected wave LSsL from the left sidewall and crosstalk xLSsL thereof, a reflected wave LSfL from the frontwall and crosstalk xLSfL thereof, a reflected wave LSsR from the rightside wall and crosstalk xLSsR thereof, and a reflected wave LSbL fromthe rear wall and crosstalk xLSbL thereof are obtained. Subsequently,normalized HRTFs to be convoluted are determined according to theincident directions of these as to the listener position Pn, and theconvolution start timing points in time thereof are the same as thoseshown in FIG. 24.

Also, in order to set the right rear speaker position RB to a virtualsound image localization position, the directions of sound wavesregarding normalized HRTFs to be convoluted are such as shown in FIG.25.

Specifically, a direct wave RBd and crosstalk xRBd thereof, a reflectedwave RBsR from the right side wall and crosstalk xRBsR thereof, areflected wave RBfR from the front wall and crosstalk xRBfR thereof, areflected wave RBsL from the left side wall and crosstalk xRBsL thereof,and a reflected wave RBbR from the rear wall and crosstalk xRBbR thereofare obtained. Subsequently, normalized HRTFs to be convoluted aredetermined according to the incident directions of these as to thelistener position Pn, and the convolution start timing points in timethereof are the same as those shown in FIG. 26.

In order to set the left rear speaker position LB to a virtual soundimage localization position, the directions of sound waves regardingnormalized HRTFs to be convoluted can be taken as those obtained bymoving the drawing shown in FIG. 25 to the left side in a symmetricalmanner. Though these will not be shown in the drawing, a direct wave LBdand crosstalk xLBd thereof, a reflected wave LBsL from the left sidewall and crosstalk xLBsL thereof, a reflected wave LBfL from the frontwall and crosstalk xLBfL thereof, a reflected wave LBsR from the rightside wall and crosstalk xLBsR thereof, and a reflected wave LBbL fromthe rear wall and crosstalk xLBbL thereof are obtained. Subsequently,normalized HRTFs to be convoluted are determined according to theincident directions of these as to the listener position Pn, and theconvolution start timing points in time thereof are the same as thoseshown in FIG. 26.

Description has been made so far regarding the directions of a directwave and reflected waves into which normalized HRTFs should beconvoluted, and the convolution start timing thereof, and an exampleregarding whether to execute the convolution processing of thesenormalized HRTFs at which channel of the HRTF convolution processingunits 74 xRF, 74LS, 74LF, 74LB, 74 xRS, 74LFE, 74C, 74 xLS, 74RB, 74RF,74RS, and 74 xLF is illustrated in FIG. 27.

With the present example, FIG. 27A illustrates the convolution starttiming of normalized HRTFs regarding a direct wave and reflected wavesand crosstalk thereof to be convoluted at the HRTF convolutionprocessing unit 74 xRF which is for the crosstalk channel xRF of theright front channel.

Though normalized HRTFs regarding a direct wave and reflected waves andcrosstalk thereof to be convoluted at the HRTF convolution processingunit 74 xLF which is for the crosstalk channel xLF of the left frontchannel are not shown in the drawing, normalized HRTFs obtained byinverting both sides of the direct wave and reflected waves andcrosstalk thereof shown in FIG. 27A are convoluted from the same starttiming as the convolution start timing shown in FIG. 27A.

FIG. 27B illustrates the convolution start timing of normalized HRTFsregarding a direct wave Cd to be convoluted at the HRTF convolutionprocessing unit 74C which is for the center channel C. That is to say,with the present example, only the normalized HRTF regarding the directwave Cd of the center channel is convoluted at the HRTF convolutionprocessing unit 74C.

FIG. 27C illustrates the convolution start timing of normalized HRTFsregarding a direct wave LFd to be convoluted at the HRTF convolutionprocessing unit 74LF which is for the left front channel LF. That is tosay, with the present example, only the normalized HRTF regarding thedirect wave LFd of the left front channel is convoluted at the HRTFconvolution processing unit 74LF.

Though not shown in the drawing, only the normalized HRTF regarding thedirect wave RFd of the right front channel is convoluted at the HRTFconvolution processing unit 74RF which is for the right front channel RFas well.

FIG. 27D illustrates the convolution start timing of normalized HRTFsregarding a direct wave and reflected waves to be convoluted at the HRTFconvolution processing unit 74LB which is for the left rear channel LB.

Though not shown in the drawing, with the HRTF convolution processingunit 74RB which is for the right rear channel RB, normalized HRTFsobtained by inverting both sides of the direct wave and reflected wavesshown in FIG. 27D are convoluted from the same start timing as theconvolution start timing shown in FIG. 27D.

FIG. 27E illustrates the convolution start timing of normalized HRTFsregarding a direct wave LSd to be convoluted at the HRTF convolutionprocessing unit 74LS which is for the left lateral channel LS. That isto say, with the present example, only the normalized HRTF regarding thedirect wave LSd of the left lateral channel is convoluted at the HRTFconvolution processing unit 74LS.

Though not shown in the drawing, only the normalized HRTF regarding thedirect wave RSd of the right lateral channel is convoluted at the HRTFconvolution processing unit 74RS which is for the right lateral channelRS as well.

FIG. 27F illustrates the convolution start timing of normalized HRTFsregarding a direct wave and reflected waves and crosstalk thereof to beconvoluted at the HRTF convolution processing unit 74 xRS which is forthe crosstalk channel xRS of the right lateral channel.

Though normalized HRTFs regarding a direct wave and reflected waves andcrosstalk thereof to be convoluted at the HRTF convolution processingunit 74 xLS which is for the crosstalk channel xLS of the left lateralchannel are not shown in the drawing, normalized HRTFs obtained byinverting both sides of the direct wave and reflected waves andcrosstalk thereof shown in FIG. 27F are convoluted from the same starttiming as the convolution start timing shown in FIG. 27A.

Note that, as described above, the above description regardingconvolution of normalized HRTFs for a direct wave and reflected waveshas been made regarding only wall reflection, but may be applied toceiling reflection and floor reflection completely in the same way.

Specifically, FIG. 28 illustrates ceiling reflection and floorreflection to be considered, for example, when convoluting HRTFs to setthe right front speaker RF to a virtual sound image localizationposition. Specifically, there can be considered a reflected wave RFcRreflected at the ceiling and input to the right ear position, similarlya reflected wave reflected at the ceiling and input to the left earposition, a reflected wave RFgR reflected at the floor and input to theright ear position, similarly a reflected wave RFgL reflected at thefloor and input to the left ear position. Also, with regard to thesereflected waves, though not shown in the drawing, crosstalk can beconsidered.

With regard to these reflected waves and crosstalk thereof as well,normalized HRTFs to be convoluted are normalized HRTFs measuredregarding directions where these sound waves have been input to thelistener position Pn lastly. Subsequently, the path length regardingeach of the reflected waves is calculated, and the convolution starttiming of each of the normalized HRTFs is determined. Subsequently, thegain of each of the normalized HRTFs is determined to be attenuationamount according to the degree of sound absorption perceived from thematerial, surface shape, and the like of the ceiling and floor.

Configuration Example of Second Example of Acoustic Reproduction System(FIG. 29)

The acoustic reproduction system shown in FIGS. 16 and 17 is the casewherein 7.1 channel multi surround audio signals are reproducedacoustically by the over-head headphones including the seven headphonedrivers each for both ears.

On the other hand, another example described below is a case wherein 7.1channel multi surround audio signals are reproduced acoustically bycommon over-head headphones including a headphone driver each for bothears.

Let us say that the example described below employs, as shown in FIG. 5,normalized HRTFs measured by disposing seven microphones each in thevicinity of both ears as for 7.1 channel multi surround. Therefore, theprocessing until the normalized HRTFs are convoluted can be regarded asthe completely same processing as the above-mentioned acousticreproduction system. Specifically, let us say that the hardwareconfiguration shown in FIG. 16 is the same as with the acousticreproduction system according to the present example.

With the acoustic reproduction system according to the present example,as shown in FIG. 29, the audio signals from the level adjustment units75 xRF, 75LS, 75LF, 75LB, 75 xRS, 75LFE, and 75C are supplied to anadder 110L for the left channels to add these.

Also, the audio signals from the level adjustment units 75LFE, 75C, 75xLS, 75RB, 75RF, 75RS, and 75 xLF are supplied to an adder 110R for theright channels to add these.

Subsequently, output signals from the adders 110L and 110R are suppliedto D/A converters 111L and 111R, and are converted into analog audiosignals, respectively. The analog audio signals from the D/A converters111L and 111R are supplied to current-to-voltage converters 112L and112R, and are converted into voltage signals from the current signals,respectively.

Subsequently, the audio signals converted into voltage signals from thecurrent-to-voltage converters 112L and 112R are subjected to leveladjustment at level adjustment units 113L and 113R, following which aresupplied to gain adjustment units 114L and 114R to subject these to gainadjustment, respectively.

Subsequently, output audio signals from the gain adjustment units 114Land 114R are supplied to a headphone driver 120L for the left ear, andheadphone driver 120R for the right ear, through amplifiers 115L and115R, and are reproduced in an acoustic manner, respectively.

According to the second example of the acoustic reproduction system, a7.1 channel multi surround sound field can be reproduced well withvirtual sound image localization by the headphones including a headdriver each for both ears.

Advantages of the Embodiment

With the related art, in the case of performing signal processing usingHRTFs, properties of the measurement system were not removed, so thesound quality following the final convolution processing deterioratedunless good-sounding expensive speakers and microphones are used formeasurement. On the other hand, with the normalized HRTFs according tothe present embodiment, properties of the measurement system can beremoved, so HRTF convolution processing with no deterioration in soundquality can be performed even if using a measurement system usinginexpensive speakers and microphones without flat properties.

Further, while ideal properties (completely flat) are elusive no matterhow expensive and having good properties the speakers and microphonesmay be, with this embodiment HRTFs more ideal that any propertiesaccording to the related art can be obtained.

Also, HRTFs regarding only direct waves, with reflected waveseliminated, are obtained with various directions as to the listener forexample as the virtual sound source position, so HRTFs regarding soundwaves form each direction can be easily convoluted in the audio signals,and the reproduced sound field when convoluting the HRTFs regarding thesound waves for each direction can be readily verified.

That is to say, as described above, an arrangement may be made wherein,with the virtual sound image localization set to a particular position,not only HRTFs regarding direct waves from the virtual sound imagelocalization position but also HRTFs regarding sound waves from adirection which can be assumed to be reflected waves from the virtualsound image localization position are convoluted, and the reproducedsound field can be verified, so as to perform verification such as whichreflected waves of which direction are effective for virtual sound imagelocalization, and so forth.

Other Embodiments

While the above description has been made regarding a case whereinheadphones are primarily the electro-optical conversion unit forperforming acoustic reproduction of audio signals to be reproduced,application can be made to applications where speakers are the outputsystem, such as front surround and so forth, taking into considerationthe measurement method and processing contents.

The acoustic reproduction system employing the multi surround method hasbeen described so far, but it goes without saying that the aboveembodiment can be applied to common two-channel stereo.

Also, it goes without saying that the above embodiment can be applied toother multi surround cases such as 5.1 channels, 9.1 channels, and soforth other than 7.1 channels.

Also, the placements of 7.1 channel multi-surround speakers have beendescribed with the placements of ITU-R speakers as an example, but itcan be readily understood that the above embodiment can be applied to acase of the placements of speakers recommended by THX Ltd.

It should be understood by those skilled in the art that variousmodifications, combinations, sub-combinations and alterations may occurdepending on design requirements and other factors insofar as they arewithin the scope of the appended claims or the equivalents thereof.

What is claimed is:
 1. A head-related transfer function convolutiondevice configured, when an audio signal is reproduced acoustically byelectro-acoustic conversion means disposed in a nearby position of bothears of a listener, to convolute a head-related transfer function intosaid audio signal, such that a sound image is localized in a perceivedvirtual sound image localization position, said head-related transferfunction convolution device comprising: a storage unit configured tostore, when a sound source is disposed in said virtual sound imagelocalization position, and sound-collecting means are disposed in theposition of said electro-acoustic conversion means, a measured directwave direction head-related transfer function regarding a direction of adirect wave from said sound source to said sound-collecting means, and ameasured reflected wave direction head-related transfer functionregarding a direction of selected one reflected wave or reflected wavedirection head-related transfer functions regarding directions ofselected plurality of reflected waves, from said sound source to saidsound-collecting means; and convolution means configured to read outsaid obtained direct wave direction head-related transfer function, andsaid reflected wave direction head-related transfer function regardingthe direction of said selected one reflected wave or said reflected wavedirection head-related transfer functions regarding the directions ofsaid selected plurality of reflected waves, and convolute into saidaudio signal.
 2. The head-related transfer function convolution deviceaccording to claim 1, wherein with said convolution means, correspondingconvolution of said direct wave direction head-related transfer functionand said reflected wave direction head-related transfer functions isexecuted upon a time series signal of said audio signal from each of afirst start point in time to start convolution processing of said directwave direction head-related transfer function, and a second start pointin time to start convolution processing of each of reflected wavedirection head-related transfer functions, determined according to apath length of sound waves from said virtual sound image localizationposition and the position of said electro-acoustic conversion means ofeach of said direct waves and said reflected waves.
 3. The head-relatedtransfer function convolution device according to claim 1, wherein withsaid convolution means, with regard to said reflected wave directionhead-related transfer functions, gain is adjusted according to anattenuation rate of sound waves at a perceived reflected portion, andsaid convolution is executed.
 4. The head-related transfer functionconvolution device according to claim 1, wherein said direct wavedirection head-related transfer function and said reflected wavedirection head-related transfer functions are normalized head-relatedtransfer functions having been obtained by placing acousto-electricconversion means nearby both ears of the listener where placement ofelectro-acoustic conversion means is assumed, picking up first soundwaves emitted at a perceived sound source position with saidacousto-electric conversion means in a state where a dummy head or ahuman exists at said listener position, measuring a head-relatedtransfer function from only the first sound waves directly reaching saidacousto-electric conversion means, picking up second sound waves emittedat the perceived sound source position with said acousto-electricconversion means in a state where no dummy head or human exists at saidlistener position, and normalizing the head-related transfer functionwith a natural-state transfer property measured from only the first orsecond sound waves directly reaching said acousto-electric conversionmeans.
 5. A head-related transfer function convolution deviceconfigured, when an audio signal is reproduced acoustically byelectro-acoustic conversion means disposed in a nearby position of bothears of a listener, to convolute a head-related transfer function intosaid audio signal, such that a sound image is localized in a perceivedvirtual sound image localization position, said head-related transferfunction convolution device comprising: means configured to performconvolution processing of the head-related transfer function, measuredby a sound source being disposed in said virtual sound imagelocalization position, and sound-collecting means being disposed in theposition of said electro-acoustic conversion means, regarding adirection of a direct wave from said sound source to saidsound-collecting means, as to said audio signal from a direct waveconvolution start point in time set beforehand to start convolution asto said audio signal, as direct wave convolution data; and meansconfigured to perform convolution processing of the head-relatedtransfer function or head-related transfer functions, measured by thesound source being disposed in said virtual sound image localizationposition, and sound-collecting means being disposed in the position ofsaid electro-acoustic conversion means, regarding a direction of aselected reflected wave or directions of selected plurality of reflectedwaves from said sound source to said sound-collecting means, as to saidaudio signal from one or plurality of reflected wave convolution startpoints in time set beforehand to start convolution as to said audiosignal, as reflected wave convolution data.
 6. The head-related transferfunction convolution device according to claim 5, wherein said directwave convolution data is a direct wave direction head-related transferfunction, measured by the sound source being disposed in said virtualsound image localization position, and said sound-collecting means beingdisposed in the position of said electro-acoustic conversion means,regarding the direction of the direct wave from said sound source tosaid sound-collecting means; and wherein said reflected wave convolutiondata is a reflected wave direction head-related transfer function,measured by the sound source being disposed in said virtual sound imagelocalization position, and said sound-collecting means being disposed inthe position of said electro-acoustic conversion means, regarding thedirection of the selected reflected wave or the directions of selectedplurality of reflected waves from said sound source to saidsound-collecting means.
 7. The head-related transfer functionconvolution device according to claim 5, wherein said direct waveconvolution data is a direct wave direction head-related transferfunction, measured by the sound source being disposed in said virtualsound image localization position, and said sound-collecting means beingdisposed in the position of said electro-acoustic conversion means,regarding the direction of the direct wave from said sound source tosaid sound-collecting means; and wherein said reflected wave convolutiondata is data obtained by attenuating said direct wave directionhead-related transfer function according to said one or plurality ofreflected wave convolution start points in time.
 8. The head-relatedtransfer function convolution device according to claim 5, wherein saiddirect wave convolution data is a direct wave direction head-relatedtransfer function, measured by the sound source being disposed in saidvirtual sound image localization position, and said sound-collectingmeans being disposed in the position of said electro-acoustic conversionmeans, regarding the direction of a direct wave from said sound sourceto said sound-collecting means; and wherein said reflected waveconvolution data is data obtained by delaying said audio data accordingto said one or plurality of reflected wave convolution start points intime.
 9. A head-related transfer function convolution device configured,when an audio signal is reproduced acoustically by an electro-acousticconversion unit disposed in a nearby position of both ears of alistener, to convolute a head-related transfer function into said audiosignal, such that a sound image is localized in a perceived virtualsound image localization position, said head-related transfer functionconvolution device comprising: a storage unit configured to store, whena sound source is disposed in said virtual sound image localizationposition, and a sound-collecting unit is disposed in the position ofsaid electro-acoustic conversion unit, a measured direct wave directionhead-related transfer function regarding a direction of a direct wavefrom said sound source to said sound-collecting unit, and a measuredreflected wave direction head-related transfer function regarding adirection of selected one reflected wave or reflected wave directionhead-related transfer functions regarding directions of selectedplurality of reflected waves, from said sound source to saidsound-collecting unit; and a convolution unit configured to read outsaid obtained direct wave direction head-related transfer function, andsaid reflected wave direction head-related transfer function regardingthe direction of said selected one reflected wave or said reflected wavedirection head-related transfer functions regarding the directions ofsaid selected plurality of reflected waves, and convolute into saidaudio signal.
 10. A head-related transfer function convolution deviceconfigured, when an audio signal is reproduced acoustically by anelectro-acoustic conversion unit disposed in a nearby position of bothears of a listener, to convolute a head-related transfer function intosaid audio signal, such that a sound image is localized in a perceivedvirtual sound image localization position, said head-related transferfunction convolution device comprising: a unit configured to performconvolution processing of the head-related transfer function, measuredby a sound source being disposed in said virtual sound imagelocalization position, and a sound-collecting unit being disposed in theposition of said electro-acoustic conversion unit, regarding a directionof a direct wave from said sound source to said sound-collecting unit,as to said audio signal from a direct wave convolution start point intime set beforehand to start convolution as to said audio signal, asdirect wave convolution data; and a unit configured to performconvolution processing of the head-related transfer function orhead-related transfer functions, measured by the sound source beingdisposed in said virtual sound image localization position, and thesound-collecting unit being disposed in the position of saidelectro-acoustic conversion unit, regarding a direction of a selectedreflected wave or directions of selected plurality of reflected wavesfrom said sound source to said sound-collecting unit, as to said audiosignal from one or plurality of reflected wave convolution start pointsin time set beforehand to start convolution as to said audio signal, asreflected wave convolution data.